About the EWCL Platform
Entropy-Weighted Collapse Likelihood (EWCL)
A biophysics-informed computational platform for analyzing protein disorder, collapse propensity, and physics–confidence conflicts in modern structure prediction.
EWCL is a biophysics-informed computational platform for analyzing protein disorder, collapse propensity, and physics–confidence conflicts in modern structure prediction.
At its core, EWCL is built around a single principle: sequence thermodynamics must be evaluated independently of structural confidence. While tools like AlphaFold provide highly accurate geometries, confidence in a predicted structure does not guarantee that the underlying sequence can sustain a stable, collapsed conformation under physiological conditions.
EWCL addresses this gap by providing a sequence-first, structure-agnostic physics baseline that can be applied uniformly to raw sequences and to structure-resolved proteins.
What EWCL Does
EWCL produces continuous, per-residue scores (0–1) that quantify collapse likelihood and ensemble breadth directly from sequence information. These scores can be:
- Applied to FASTA / UniProt sequences
- Applied to PDB, mmCIF, and AlphaFold structures (using the same sequence model)
- Aligned against AlphaFold confidence (pLDDT) for conflict analysis
- Exported as JSON, CSV, or annotated PDB
EWCL does not classify residues as simply "ordered" or "disordered." Instead, it models disorder and collapse as a continuous thermodynamic spectrum, enabling fine-grained residue-level interpretation.
EWCLv1 — Unified Physics Baseline
EWCLv1Single Core Model
Type:
Sequence-derived model
Inputs:
- • FASTA / UniProt sequence
- • PDB / mmCIF / AlphaFold PDB (sequence extracted internally)
EWCLv1 is the single core model powering the platform.
It computes entropy-weighted collapse likelihood directly from amino-acid composition, patterning, and physicochemical context. Importantly, EWCLv1 does not depend on 3D coordinates for its prediction, even when a structure is provided. When structures are uploaded, geometry is used only for alignment and visualization, not for learning or inference.
Because EWCLv1 is sequence-only at its core, it serves as a physics reference that can be applied consistently across datasets and input modalities.
Scientific role:
- • Provides a structure-independent thermodynamic baseline
- • Enables fair comparison between sequence physics and model confidence
- • Scales from proteome-wide scans to single-protein audits
Hallucination and Physics–Confidence Conflict
Hallucination DetectionDerived from EWCLv1 × pLDDT
When EWCLv1 scores are aligned with AlphaFold confidence (pLDDT), EWCL enables detection of physics–confidence conflicts.
A conflict arises when:
- • EWCLv1 indicates high entropy / low collapse likelihood, and
- • AlphaFold reports high confidence (high pLDDT)
This pattern highlights regions where a model is confident in a geometry that the sequence context suggests is thermodynamically unstable or ensemble-driven.
Hallucination in EWCL is not a predicted class, but a derived diagnostic computed from the disagreement between:
- • sequence-derived physics (EWCLv1), and
- • AI-reported confidence (pLDDT)
This makes hallucination detection transparent, interpretable, and reproducible.
Unsupervised Latent Model
Blind Sequence Representation
Unsupervised LatentLabel-Free Sequence Analysis
EWCL also includes an unsupervised latent sequence model trained without disorder labels and without DisProt, IDEAL, or curated supervision.
This model learns intrinsic sequence representations directly from amino-acid patterns alone. Its purpose is not to replicate any specific disorder definition, but to provide an orthogonal signal for:
- • pattern discovery
- • clustering and representation analysis
- • hypothesis generation
- • cross-checking EWCLv1 signals without supervised bias
The latent model should be interpreted as exploratory and complementary, not as a replacement for EWCLv1.
Validation and Performance
EWCLv1 has been evaluated on large, leakage-free benchmarks spanning millions of residues, including curated disorder datasets and merged references. The model is optimized for:
- ultra-fast sequence-only analysis
- continuous outputs suitable for threshold sweeps
- consistent behavior across sequence and structure inputs
Detailed benchmarking protocols, metrics, and datasets are fully documented and publicly available.
Platform Scope
EWCL currently supports:
Inputs:
- • FASTA, UniProt, PDB, mmCIF, AlphaFold PDB
Outputs:
- • JSON, CSV, annotated PDB
Analysis:
- • sequence-only collapse likelihood
- • EWCL vs pLDDT conflict profiling
- • residue-level entropy and collapse tracks
Structure-aware contextual extensions (geometry-conditioned overlays and advanced diagnostics) are under active development and will be released as premium functionality in future platform updates.
Contributors and Scientific Context
Lead DeveloperLucas Cristino
Lucas Cristino is the lead developer and principal architect of EWCL, responsible for model design, data pipelines, benchmarking, and platform deployment. His work focuses on integrating sequence-scale entropy analytics with physics-informed collapse modeling.
Scientific GuidanceProf. Vladimir N. Uversky
Prof. Vladimir N. Uversky has provided scientific guidance grounded in foundational research on intrinsically disordered proteins, including charge–hydropathy principles and disorder–function relationships that inform EWCL's theoretical framing.
Why EWCL Matters
Sequence-first physics in a structure-first era
Transparent hallucination diagnostics
Continuous, interpretable outputs
Scales from proteomes to single structures
Designed for reproducibility and open science
Ready to Explore EWCL?
Start analyzing your protein sequences and structures with our physics-informed approach to disorder prediction and hallucination detection.