Entropy-Weighted Collapse Likelihood
About EWCL
EWCL is a sequence-first platform for protein disorder and collapse likelihood analysis. The current live app focuses on residue-level EWCL scoring, structure-aligned visualization, and pLDDT comparison for confidence review.
Sequence analysis
FASTA and UniProt inputs run through the current EWCL sequence models and return aligned per-residue scores.
Structure/PDB analysis
PDB, mmCIF, and AlphaFold-style structures are parsed for sequence, displayed in Molstar, and compared with pLDDT when confidence values are present.
EWCL-vs-pLDDT review
Disagreement calls are computed from the active model output and the uploaded structure confidence track. These are review diagnostics, not automatic hallucination labels.
Current model surfaces
EWCL-Sequence-Main and EWCL-Structure-Main are the primary live EWCL signals. Experimental, Raw, and ESMC-backed tracks are comparison lenses. Scores are continuous values from 0 to 1 and are kept aligned to the input sequence for line plots, heat bands, tables, exports, GT overlays, and structure coloring.
Sequence models
- EWCL-Sequence-Main
- EWCL-Sequence-Experimental
- EWCL-Sequence-Raw
- EWCL-Sequence-ESMC
Structure models
- EWCL-Structure-Main
- EWCL-Structure-Experimental
- EWCL-Structure-ESMC
Live analysis surfaces
The header routes expose the current production workflow: submit a FASTA/UniProt sequence, upload or fetch an AlphaFold/PDB/mmCIF structure, compare EWCL models, and inspect pLDDT-aware disagreement diagnostics.
- Sequence analysis: Main primary scoring with Experimental, Raw, ESMC, and external-reference comparisons.
- Structure/PDB analysis: structure-aware Main scoring with Experimental and ESMC comparison lenses.
- Benchmark page: current full-overlap and held-out metrics against explicit GT tracks and external references.
Data downloads and model coverage
Protein ID pages and downloadable bundles are score-first: current EWCL tracks are generated across the public protein index where the corresponding model class is available. Benchmark tables then subset those scores by ground-truth availability and held-out definitions.
| Model | Scope | ID-page coverage | Use |
|---|---|---|---|
| EWCL-Sequence-Main | Sequence | All indexed proteins with sequence scores | Primary continuous sequence signal. |
| EWCL-Sequence-Experimental | Sequence | All indexed proteins with sequence scores | Held-out evaluation excludes CheZOD/experimental-training UniProt IDs. |
| EWCL-Sequence-Raw | Sequence | All indexed proteins with sequence scores | Raw/minimally supervised feature-space signal. |
| EWCL-Sequence-ESMC | Sequence / live ESMC | On-demand sequence runs | ESMC embedding-backed live comparison model with GT overlays when accession is known. |
| EWCL-Structure-Main | Structure / AlphaFold subset | Indexed proteins with structure/AlphaFold-feature scores | Primary structure-aware signal. |
| EWCL-Structure-Experimental | Structure / AlphaFold subset | Indexed proteins with structure/AlphaFold-feature scores | Experimental-signal-aligned structure model. |
| EWCL-Structure-ESMC | Structure / live ESMC | On-demand AlphaFold/PDB/mmCIF runs | ESMC-backed structure lens for comparison with EWCL-Main, pLDDT, and GT overlays. |
Structure-model coverage is tied to structure/AlphaFold-feature availability. Sequence model coverage is expected across the indexed sequence set.
How to read conflicts
EWCL-vs-pLDDT disagreement is a diagnostic layer. A high-confidence AlphaFold region with high EWCL disorder score should be reviewed, but it is not automatically a hallucination. Some proteins contain real functional folded islands inside otherwise disordered regions.
Current language
The platform uses EWCL-reference consistency labels and separates aligned regions, transitions, threshold-near mismatches, and stronger EWCL/pLDDT conflicts that need manual inspection.
Contributors
Lucas Cristino
Model design, data pipelines, benchmarking, and platform deployment.
Prof. Vladimir N. Uversky
Scientific guidance and co-author context for intrinsically disordered proteins and biological interpretation.