E
EWCL
Back to home

Entropy-Weighted Collapse Likelihood

About EWCL

EWCL is a sequence-first platform for protein disorder and collapse likelihood analysis. The current live app focuses on residue-level EWCL scoring, structure-aligned visualization, and pLDDT comparison for confidence review.

Sequence analysis

FASTA and UniProt inputs run through the current EWCL sequence models and return aligned per-residue scores.

Structure/PDB analysis

PDB, mmCIF, and AlphaFold-style structures are parsed for sequence, displayed in Molstar, and compared with pLDDT when confidence values are present.

EWCL-vs-pLDDT review

Disagreement calls are computed from the active model output and the uploaded structure confidence track. These are review diagnostics, not automatic hallucination labels.

Current model surfaces

EWCL-Sequence-Main and EWCL-Structure-Main are the primary live EWCL signals. Experimental, Raw, and ESMC-backed tracks are comparison lenses. Scores are continuous values from 0 to 1 and are kept aligned to the input sequence for line plots, heat bands, tables, exports, GT overlays, and structure coloring.

Sequence models

  • EWCL-Sequence-Main
  • EWCL-Sequence-Experimental
  • EWCL-Sequence-Raw
  • EWCL-Sequence-ESMC

Structure models

  • EWCL-Structure-Main
  • EWCL-Structure-Experimental
  • EWCL-Structure-ESMC

Live analysis surfaces

The header routes expose the current production workflow: submit a FASTA/UniProt sequence, upload or fetch an AlphaFold/PDB/mmCIF structure, compare EWCL models, and inspect pLDDT-aware disagreement diagnostics.

  • Sequence analysis: Main primary scoring with Experimental, Raw, ESMC, and external-reference comparisons.
  • Structure/PDB analysis: structure-aware Main scoring with Experimental and ESMC comparison lenses.
  • Benchmark page: current full-overlap and held-out metrics against explicit GT tracks and external references.

Data downloads and model coverage

Protein ID pages and downloadable bundles are score-first: current EWCL tracks are generated across the public protein index where the corresponding model class is available. Benchmark tables then subset those scores by ground-truth availability and held-out definitions.

ModelScopeID-page coverageUse
EWCL-Sequence-MainSequenceAll indexed proteins with sequence scoresPrimary continuous sequence signal.
EWCL-Sequence-ExperimentalSequenceAll indexed proteins with sequence scoresHeld-out evaluation excludes CheZOD/experimental-training UniProt IDs.
EWCL-Sequence-RawSequenceAll indexed proteins with sequence scoresRaw/minimally supervised feature-space signal.
EWCL-Sequence-ESMCSequence / live ESMCOn-demand sequence runsESMC embedding-backed live comparison model with GT overlays when accession is known.
EWCL-Structure-MainStructure / AlphaFold subsetIndexed proteins with structure/AlphaFold-feature scoresPrimary structure-aware signal.
EWCL-Structure-ExperimentalStructure / AlphaFold subsetIndexed proteins with structure/AlphaFold-feature scoresExperimental-signal-aligned structure model.
EWCL-Structure-ESMCStructure / live ESMCOn-demand AlphaFold/PDB/mmCIF runsESMC-backed structure lens for comparison with EWCL-Main, pLDDT, and GT overlays.

Structure-model coverage is tied to structure/AlphaFold-feature availability. Sequence model coverage is expected across the indexed sequence set.

How to read conflicts

EWCL-vs-pLDDT disagreement is a diagnostic layer. A high-confidence AlphaFold region with high EWCL disorder score should be reviewed, but it is not automatically a hallucination. Some proteins contain real functional folded islands inside otherwise disordered regions.

Current language

The platform uses EWCL-reference consistency labels and separates aligned regions, transitions, threshold-near mismatches, and stronger EWCL/pLDDT conflicts that need manual inspection.

Contributors

Lucas Cristino

Model design, data pipelines, benchmarking, and platform deployment.

Prof. Vladimir N. Uversky

Scientific guidance and co-author context for intrinsically disordered proteins and biological interpretation.