E
EWCL

About the EWCL Platform

Entropy-Weighted Collapse Likelihood (EWCL)

A biophysics-informed computational platform for analyzing protein disorder, collapse propensity, and physics–confidence conflicts in modern structure prediction.

EWCL is built around one principle: sequence physics must be evaluated independently of AI-reported structural confidence. AlphaFold can produce highly accurate geometries, but high confidence in a predicted structure does not guarantee the underlying sequence can sustain a stable collapsed ensemble under physiological conditions. EWCL provides a sequence-first physics baseline that can be applied consistently to both raw sequences and structure-resolved proteins—and then compared directly to pLDDT to reveal disagreement hotspots.

What EWCL Produces

EWCL generates continuous, per-residue scores (0–1) and derived diagnostics that can be used for:

  • FASTA / UniProt sequence analysis (live)
  • Structure analysis from PDB / mmCIF / AlphaFold PDB (live; sequence extracted for scoring)
  • EWCL vs pLDDT conflict profiling (derived diagnostic)
  • Exportable outputs for downstream use (CSV / JSON, and structure-aligned annotation for visualization)

EWCL does not treat disorder as a single binary state. It models disorder/collapse as a continuous thermodynamic spectrum, enabling residue-level interpretation, threshold sweeps, and segment-level summarization.

EWCL Models and Signals

EWCL exposes multiple complementary signals. Each produces a per-residue score (0–1) aligned to the same protein sequence.

1) EWCL-Sequence — Collapse propensity (sequence-first)

A fast, sequence-derived signal designed to quantify collapse propensity and ensemble breadth directly from amino-acid composition and patterning.

Role: physics baseline for collapse propensity along the chain.

2) EWCL-Disorder — Disorder probability (sequence-first)

A sequence-derived disorder signal optimized to capture disorder-like behavior as a continuous score, with optional binary thresholds for evaluation and segment calling.

Role: primary disorder-oriented predictor for comparison against curated GT tracks.

3) EWCL-Structure — Structure-context signal (structure-aligned, EWCL-scored)

A structure-aligned EWCL signal designed to support structure/physics consistency auditing. Especially useful when compared to AlphaFold confidence to identify regions where "physics-like" and "confidence-like" narratives diverge.

Role: audit signal for structure-first predictions and confidence conflicts.

4) EWCL-Raw — Underlying physics signal (diagnostic)

A raw EWCL track that exposes the underlying per-residue signal used for interpretation and model comparison.

Role: transparency + debugging + cross-signal inspection.

Note: EWCL can be applied consistently across sequence and structure inputs. When a structure is provided, geometry is used for alignment and visualization, while EWCL scoring remains anchored in sequence-level inference and consistent per-residue outputs.

Derived Diagnostic

Physics–Confidence Conflict (Hallucination Hotspots)

EWCL includes an explicit, transparent diagnostic for detecting physics–confidence conflicts in modern structure prediction.

A conflict occurs when:

  • EWCL signals indicate disorder-like / low collapse support, but
  • AlphaFold reports high confidence (high pLDDT)

This pattern highlights regions where the model is confident in a geometry that sequence context suggests may be ensemble-driven, flexible, or thermodynamically unstable.

Important: In EWCL, hallucination is not a predicted class. It is a derived diagnostic computed from disagreement between EWCL signals and pLDDT, making it interpretable, reproducible, and auditable.

Platform Implementation

Platform Implementation

EWCL is both a model suite and a production platform.

Precomputed Protein Database (fast exploration)

EWCL provides an interactive database of ~3.7k precomputed proteins, with per-residue EWCL tracks, AlphaFold confidence alignment (when available), ground truth overlays (when available), and downloadable artifacts.

Use the database to:

  • search by UniProt ID
  • inspect residue-level tracks immediately (no compute wait)
  • compare EWCL signals vs pLDDT and curated references
  • export model outputs per protein

Live Analysis (sequence and structure)

EWCL also supports live analysis for user inputs:

Inputs

  • FASTA / UniProt
  • PDB / mmCIF / AlphaFold PDB

Live outputs

  • per-residue EWCL tracks (0–1)
  • pLDDT alignment and conflict diagnostics (when structure contains confidence)
  • segment summaries and per-residue tables
  • exports (CSV / JSON; structure-aligned annotation for visualization)

This supports both proteome-style scanning and single-structure audits.

Validation

Validation and Reproducibility

EWCL signals have been evaluated on leakage-controlled benchmarks spanning large residue volumes, including curated and merged ground truth references.

The platform is designed for:

  • consistent behavior across sequence and structure inputs
  • continuous outputs suitable for threshold sweeps
  • fully traceable exports (per-residue scores + settings + metadata)
Contributors

Contributors and Scientific Context

Lead Developer

Lucas Cristino

Model design, data pipelines, benchmarking, and platform deployment.

Scientific Guidance & Co-author

Prof. Vladimir N. Uversky (University of South Florida)

Provides scientific guidance grounded in foundational research on intrinsically disordered proteins and regions (IDPs/IDRs) and is a co-author on the EWCL manuscript(s) currently in preparation. His input informs EWCL's theoretical framing, interpretation of disorder physics, and biological context.

Research Impact

Why EWCL Matters

Sequence-first physics in a structure-first era

An independent thermodynamic baseline to complement AI structure prediction.

Transparent conflict diagnostics

"Hallucination hotspots" are derived from explicit disagreements—not black-box classes.

Continuous, interpretable outputs

Residue-level inspection, threshold sweeps, and segment-level summaries.

Scales from databases to live analysis

Precomputed protein library for instant exploration + live inference for new sequences/structures.

Ready to Explore EWCL?

Search the precomputed database or run live analysis on your own sequences and structures. Export per-residue outputs (CSV/JSON) for downstream research and benchmarking.