About the EWCL Platform

Entropy-Weighted Collapse Likelihood

A biophysics-informed computational framework for protein disorder analysis, structural collapse prediction, and entropy-aware pathogenicity assessment.

Introduction

Proteins are the most versatile and dynamic macromolecules of life. They fold, misfold, interact, and collapse into structures that underpin cellular function and dysfunction alike. Yet not all protein sequences resolve cleanly into stable three-dimensional structures: many remain disordered, context-dependent, or prone to collapse, making their behavior difficult to capture with conventional structure prediction tools.

The Entropy-Weighted Collapse Likelihood (EWCL) project was born to address this frontier. By combining sequence-only entropy features, structural overlays, and a physics-inspired collapse rule, EWCL provides a principled framework for predicting structural disorder, collapse propensity, and AlphaFold hallucination. The platform integrates protein physics, machine learning, and information theory, while remaining rigorously benchmarked against curated disorder databases and large-scale structural datasets.

Where conventional predictors treat disorder as a binary classification, EWCL models return continuous scores (0–1), enabling residue-level analysis of collapse likelihood, entropy volatility, and hallucination segments. These scores can be aligned against UniProt features, AlphaFold confidence (pLDDT), and structural overlays in real time.

Theoretical Foundation

Scientific Rationale

The foundation of EWCL draws from three converging ideas:

1. Entropy Weighting

Building on hydropathy/charge frameworks pioneered by Uversky and colleagues, EWCL incorporates Shannon entropy measures of sequence distribution, yielding scores sensitive to local volatility and long-range compositional context.

2. Collapse Likelihood

Disorder is not uniform. Some residues resist structuring, while others collapse under hydrophobic and entropic pressures. EWCL quantifies this collapse likelihood directly, providing a probabilistic scale of structural propensity.

3. Hallucination Detection

The current hallucination module detects inconsistencies between EWCL disorder likelihood and AlphaFold pLDDT confidence. Future work will extend this validation to crystallographic B-factors and multi-model ensembles.

Hallucination Formula

H = σ(λ · (EWCL - (1 - pLDDT/100)))

where hallucination occurs when both EWCL is high (disorder) and pLDDT is high (confidence). This identifies potential over-structuring artifacts in AI models.

Dataset Benchmark

Benchmarking and Validation

All evaluations were performed on independent datasets with leakage-free splitting. Metrics are fully documented for reproducibility.

EWCLv1 Sequencer
Sequence-only analysis

EWCLv1 performance across 3,740 proteins (2.29M residues) with sequence-only analysis:

DisProt (curated disorder regions):

  • AUROC: 0.922
  • AUPRC: 0.753 (Lift: 5.31×)
  • Residues: 2.29M (325K positive)

Merge Dataset:

  • AUROC: 0.905
  • AUPRC: 0.754 (Lift: 4.58×)
  • Residues: 2.29M (378K positive)

IDEAL:

  • AUROC: 0.713
  • AUPRC: 0.062 (Lift: 2.11×)
  • Residues: 2.29M (67K positive)

EWCLv1 provides ultra-fast sequence-only analysis optimized for speed and continuous outputs.

EWCLi PDB Physics
Structure-aware analysis

EWCLi performance across 771 AlphaFold2 structures (495K residues) with structure-aware analysis:

DisProt (curated disorder regions):

  • AUROC: 0.870
  • AUPRC: 0.439
  • Residues: 495K (45K positive)

Merge Dataset:

  • AUROC: 0.804
  • AUPRC: 0.508
  • Residues: 495K (85K positive)

IDEAL:

  • AUROC: 0.717
  • AUPRC: 0.289
  • Residues: 495K (65K positive)

EWCLi provides structure-aware analysis using PDB/CIF uploads with B-factors, pLDDT, and curvature features.

Implementation

Platform Components

EWCLi (Physics Mode)

PDB/CIF uploads processed into residue-level collapse likelihood scores.

EWCLv1 (Sequencer Mode)

FASTA-based entropy predictions, optimized for speed and continuous outputs.

Hallucination Analysis

Joint visualization of EWCL vs. pLDDT, entropy agreement profiles, and hallucination segments, with 3D Mol overlay.

Validation Modules

Automated benchmarking against IDP datasets (IDP Validation) and hallucination datasets (Hallucination Validation).

All modules are light/dark-aware, fully exportable (JSON, CSV, annotated PDB), and designed for reproducibility.

Research Impact

Why EWCL Matters

Bridges physics and AI

Entropy-weighted collapse likelihood provides a missing layer between hydropathy/charge theory and deep learning structural models.

Identifies hallucinations

A principled approach to diagnosing AI over-structuring artifacts.

Variant utility

Direct application of entropy features to variant pathogenicity prediction.

Research + translational impact

EWCL's benchmarks suggest immediate utility for IDP research, variant annotation, and AI-protein model validation.

Open Science

Access and Reproducibility

The EWCL platform is available through a public interface supporting:

Supported Formats

  • Uploads: PDB, CIF, FASTA (up to 100 MB)
  • Exports: JSON, CSV, annotated PDB
  • Overlays: UniProt features, hallucination segments, entropy profiles
  • API access for programmatic integration

Documentation & Reproducibility

  • Methods, formulas, and evaluation metrics fully documented
  • Data pipelines built leakage-free and reproducible
  • Open source implementation available
  • Comprehensive benchmarking datasets provided
Experimental Extension

Future Work

EWCL is an ongoing research effort uniting quantum-physics–inspired collapse theory, sequence-scale entropy analytics, and AI model validation. Future extensions will focus on:

Binding Disorder Predictors

Development of LightGBM-enhanced models with high-dimensional feature integration (hydropathy, entropy, structural curvature) for predicting protein–protein interaction disorder and binding interfaces.

Variant Pathogenicity Integration

Expansion to ClinVar-scale benchmarking, incorporating entropy-delta features for pathogenic/benign classification at the variant level.

Comparative Proteomics

Expansion of EWCL to comparative proteomics, leveraging evolutionary datasets to analyze how disorder and collapse likelihood are conserved or diverge across species. This extension aims to provide a quantitative framework for linking entropy-based collapse dynamics with evolutionary constraints.

PDQC Extensions

Phase-Dependent Quantum Collapse (PDQC) formulations for exploring disorder–order transitions, with cross-disciplinary applications in drug discovery, structural biology, and materials science.

Ready to Explore EWCL?

Start analyzing your protein structures with our physics-informed approach to disorder prediction and hallucination detection.