About the EWCL Platform
Entropy-Weighted Collapse Likelihood (EWCL)
A biophysics-informed computational platform for analyzing protein disorder, collapse propensity, and physics–confidence conflicts in modern structure prediction.
EWCL is built around one principle: sequence physics must be evaluated independently of AI-reported structural confidence. AlphaFold can produce highly accurate geometries, but high confidence in a predicted structure does not guarantee the underlying sequence can sustain a stable collapsed ensemble under physiological conditions. EWCL provides a sequence-first physics baseline that can be applied consistently to both raw sequences and structure-resolved proteins—and then compared directly to pLDDT to reveal disagreement hotspots.
What EWCL Produces
EWCL generates continuous, per-residue scores (0–1) and derived diagnostics that can be used for:
- FASTA / UniProt sequence analysis (live)
- Structure analysis from PDB / mmCIF / AlphaFold PDB (live; sequence extracted for scoring)
- EWCL vs pLDDT conflict profiling (derived diagnostic)
- Exportable outputs for downstream use (CSV / JSON, and structure-aligned annotation for visualization)
EWCL does not treat disorder as a single binary state. It models disorder/collapse as a continuous thermodynamic spectrum, enabling residue-level interpretation, threshold sweeps, and segment-level summarization.
EWCL Models and Signals
EWCL exposes multiple complementary signals. Each produces a per-residue score (0–1) aligned to the same protein sequence.
1) EWCL-Sequence — Collapse propensity (sequence-first)
A fast, sequence-derived signal designed to quantify collapse propensity and ensemble breadth directly from amino-acid composition and patterning.
Role: physics baseline for collapse propensity along the chain.
2) EWCL-Disorder — Disorder probability (sequence-first)
A sequence-derived disorder signal optimized to capture disorder-like behavior as a continuous score, with optional binary thresholds for evaluation and segment calling.
Role: primary disorder-oriented predictor for comparison against curated GT tracks.
3) EWCL-Structure — Structure-context signal (structure-aligned, EWCL-scored)
A structure-aligned EWCL signal designed to support structure/physics consistency auditing. Especially useful when compared to AlphaFold confidence to identify regions where "physics-like" and "confidence-like" narratives diverge.
Role: audit signal for structure-first predictions and confidence conflicts.
4) EWCL-Raw — Underlying physics signal (diagnostic)
A raw EWCL track that exposes the underlying per-residue signal used for interpretation and model comparison.
Role: transparency + debugging + cross-signal inspection.
Note: EWCL can be applied consistently across sequence and structure inputs. When a structure is provided, geometry is used for alignment and visualization, while EWCL scoring remains anchored in sequence-level inference and consistent per-residue outputs.
Physics–Confidence Conflict (Hallucination Hotspots)
EWCL includes an explicit, transparent diagnostic for detecting physics–confidence conflicts in modern structure prediction.
A conflict occurs when:
- EWCL signals indicate disorder-like / low collapse support, but
- AlphaFold reports high confidence (high pLDDT)
This pattern highlights regions where the model is confident in a geometry that sequence context suggests may be ensemble-driven, flexible, or thermodynamically unstable.
Important: In EWCL, hallucination is not a predicted class. It is a derived diagnostic computed from disagreement between EWCL signals and pLDDT, making it interpretable, reproducible, and auditable.
Platform Implementation
EWCL is both a model suite and a production platform.
Precomputed Protein Database (fast exploration)
EWCL provides an interactive database of ~3.7k precomputed proteins, with per-residue EWCL tracks, AlphaFold confidence alignment (when available), ground truth overlays (when available), and downloadable artifacts.
Use the database to:
- search by UniProt ID
- inspect residue-level tracks immediately (no compute wait)
- compare EWCL signals vs pLDDT and curated references
- export model outputs per protein
Live Analysis (sequence and structure)
EWCL also supports live analysis for user inputs:
Inputs
- FASTA / UniProt
- PDB / mmCIF / AlphaFold PDB
Live outputs
- per-residue EWCL tracks (0–1)
- pLDDT alignment and conflict diagnostics (when structure contains confidence)
- segment summaries and per-residue tables
- exports (CSV / JSON; structure-aligned annotation for visualization)
This supports both proteome-style scanning and single-structure audits.
Validation and Reproducibility
EWCL signals have been evaluated on leakage-controlled benchmarks spanning large residue volumes, including curated and merged ground truth references.
The platform is designed for:
- consistent behavior across sequence and structure inputs
- continuous outputs suitable for threshold sweeps
- fully traceable exports (per-residue scores + settings + metadata)
Contributors and Scientific Context
Lead Developer
Lucas Cristino
Model design, data pipelines, benchmarking, and platform deployment.
Scientific Guidance & Co-author
Prof. Vladimir N. Uversky (University of South Florida)
Provides scientific guidance grounded in foundational research on intrinsically disordered proteins and regions (IDPs/IDRs) and is a co-author on the EWCL manuscript(s) currently in preparation. His input informs EWCL's theoretical framing, interpretation of disorder physics, and biological context.
Why EWCL Matters
Sequence-first physics in a structure-first era
An independent thermodynamic baseline to complement AI structure prediction.
Transparent conflict diagnostics
"Hallucination hotspots" are derived from explicit disagreements—not black-box classes.
Continuous, interpretable outputs
Residue-level inspection, threshold sweeps, and segment-level summaries.
Scales from databases to live analysis
Precomputed protein library for instant exploration + live inference for new sequences/structures.
Ready to Explore EWCL?
Search the precomputed database or run live analysis on your own sequences and structures. Export per-residue outputs (CSV/JSON) for downstream research and benchmarking.