About the EWCL Platform

Entropy-Weighted Collapse Likelihood (EWCL)

A biophysics-informed computational platform for analyzing protein disorder, collapse propensity, and physics–confidence conflicts in modern structure prediction.

EWCL is built around one principle: sequence physics must be evaluated independently of AI-reported structural confidence. AlphaFold can produce highly accurate geometries, but high confidence in a predicted structure does not guarantee the underlying sequence can sustain a stable collapsed ensemble under physiological conditions. EWCL provides a sequence-first physics baseline that can be applied consistently to both raw sequences and structure-resolved proteins—and then compared directly to pLDDT to reveal disagreement hotspots.

What EWCL Produces

EWCL generates continuous, per-residue scores (0–1) and derived diagnostics that can be used for:

FASTA / UniProt sequence analysis (live)
Structure analysis from PDB / mmCIF / AlphaFold PDB (live; sequence extracted for scoring)
EWCL vs pLDDT conflict profiling (derived diagnostic)
Exportable outputs for downstream use (CSV / JSON, and structure-aligned annotation for visualization)

EWCL does not treat disorder as a single binary state. It models disorder/collapse as a continuous thermodynamic spectrum, enabling residue-level interpretation, threshold sweeps, and segment-level summarization.

EWCL Models and Signals

EWCL exposes multiple complementary signals. Each produces a per-residue score (0–1) aligned to the same protein sequence.

1) EWCL-Sequence — Collapse propensity (sequence-first)

A fast, sequence-derived signal designed to quantify collapse propensity and ensemble breadth directly from amino-acid composition and patterning.

Role: physics baseline for collapse propensity along the chain.

2) EWCL-Disorder — Disorder probability (sequence-first)

A sequence-derived disorder signal optimized to capture disorder-like behavior as a continuous score, with optional binary thresholds for evaluation and segment calling.

Role: primary disorder-oriented predictor for comparison against curated GT tracks.

3) EWCL-Structure — Structure-context signal (structure-aligned, EWCL-scored)

A structure-aligned EWCL signal designed to support structure/physics consistency auditing. Especially useful when compared to AlphaFold confidence to identify regions where "physics-like" and "confidence-like" narratives diverge.

Role: audit signal for structure-first predictions and confidence conflicts.

4) EWCL-Raw — Underlying physics signal (diagnostic)

A raw EWCL track that exposes the underlying per-residue signal used for interpretation and model comparison.

Role: transparency + debugging + cross-signal inspection.

Note: EWCL can be applied consistently across sequence and structure inputs. When a structure is provided, geometry is used for alignment and visualization, while EWCL scoring remains anchored in sequence-level inference and consistent per-residue outputs.

Derived Diagnostic

Physics–Confidence Conflict (Hallucination Hotspots)

EWCL includes an explicit, transparent diagnostic for detecting physics–confidence conflicts in modern structure prediction.

A conflict occurs when:

EWCL signals indicate disorder-like / low collapse support, but
AlphaFold reports high confidence (high pLDDT)

This pattern highlights regions where the model is confident in a geometry that sequence context suggests may be ensemble-driven, flexible, or thermodynamically unstable.

Important: In EWCL, hallucination is not a predicted class. It is a derived diagnostic computed from disagreement between EWCL signals and pLDDT, making it interpretable, reproducible, and auditable.

Platform Implementation

EWCL is both a model suite and a production platform.

Precomputed Protein Database (fast exploration)

EWCL provides an interactive database of ~3.7k precomputed proteins, with per-residue EWCL tracks, AlphaFold confidence alignment (when available), ground truth overlays (when available), and downloadable artifacts.

Use the database to:

search by UniProt ID
inspect residue-level tracks immediately (no compute wait)
compare EWCL signals vs pLDDT and curated references
export model outputs per protein

Live Analysis (sequence and structure)

EWCL also supports live analysis for user inputs:

Inputs

FASTA / UniProt
PDB / mmCIF / AlphaFold PDB

Live outputs

per-residue EWCL tracks (0–1)
pLDDT alignment and conflict diagnostics (when structure contains confidence)
segment summaries and per-residue tables
exports (CSV / JSON; structure-aligned annotation for visualization)

This supports both proteome-style scanning and single-structure audits.

Validation

Validation and Reproducibility

EWCL signals have been evaluated on leakage-controlled benchmarks spanning large residue volumes, including curated and merged ground truth references.

The platform is designed for:

consistent behavior across sequence and structure inputs
continuous outputs suitable for threshold sweeps
fully traceable exports (per-residue scores + settings + metadata)

Contributors

Contributors and Scientific Context

Lead Developer

Lucas Cristino

Model design, data pipelines, benchmarking, and platform deployment.

Scientific Guidance & Co-author

Prof. Vladimir N. Uversky (University of South Florida)

Provides scientific guidance grounded in foundational research on intrinsically disordered proteins and regions (IDPs/IDRs) and is a co-author on the EWCL manuscript(s) currently in preparation. His input informs EWCL's theoretical framing, interpretation of disorder physics, and biological context.

Research Impact

Why EWCL Matters

Sequence-first physics in a structure-first era

An independent thermodynamic baseline to complement AI structure prediction.

Transparent conflict diagnostics

"Hallucination hotspots" are derived from explicit disagreements—not black-box classes.

Continuous, interpretable outputs

Residue-level inspection, threshold sweeps, and segment-level summaries.

Scales from databases to live analysis

Precomputed protein library for instant exploration + live inference for new sequences/structures.

Ready to Explore EWCL?

Search the precomputed database or run live analysis on your own sequences and structures. Export per-residue outputs (CSV/JSON) for downstream research and benchmarking.