E
EWCL

About the EWCL Platform

Entropy-Weighted Collapse Likelihood (EWCL)

A biophysics-informed computational platform for analyzing protein disorder, collapse propensity, and physics–confidence conflicts in modern structure prediction.

EWCL is a biophysics-informed computational platform for analyzing protein disorder, collapse propensity, and physics–confidence conflicts in modern structure prediction.

At its core, EWCL is built around a single principle: sequence thermodynamics must be evaluated independently of structural confidence. While tools like AlphaFold provide highly accurate geometries, confidence in a predicted structure does not guarantee that the underlying sequence can sustain a stable, collapsed conformation under physiological conditions.

EWCL addresses this gap by providing a sequence-first, structure-agnostic physics baseline that can be applied uniformly to raw sequences and to structure-resolved proteins.

What EWCL Does

EWCL produces continuous, per-residue scores (0–1) that quantify collapse likelihood and ensemble breadth directly from sequence information. These scores can be:

  • Applied to FASTA / UniProt sequences
  • Applied to PDB, mmCIF, and AlphaFold structures (using the same sequence model)
  • Aligned against AlphaFold confidence (pLDDT) for conflict analysis
  • Exported as JSON, CSV, or annotated PDB

EWCL does not classify residues as simply "ordered" or "disordered." Instead, it models disorder and collapse as a continuous thermodynamic spectrum, enabling fine-grained residue-level interpretation.

Core Model

EWCLv1 — Unified Physics Baseline

EWCLv1
Single Core Model

Type:

Sequence-derived model

Inputs:

  • • FASTA / UniProt sequence
  • • PDB / mmCIF / AlphaFold PDB (sequence extracted internally)

EWCLv1 is the single core model powering the platform.

It computes entropy-weighted collapse likelihood directly from amino-acid composition, patterning, and physicochemical context. Importantly, EWCLv1 does not depend on 3D coordinates for its prediction, even when a structure is provided. When structures are uploaded, geometry is used only for alignment and visualization, not for learning or inference.

Because EWCLv1 is sequence-only at its core, it serves as a physics reference that can be applied consistently across datasets and input modalities.

Scientific role:

  • • Provides a structure-independent thermodynamic baseline
  • • Enables fair comparison between sequence physics and model confidence
  • • Scales from proteome-wide scans to single-protein audits
Derived Diagnostic

Hallucination and Physics–Confidence Conflict

Hallucination Detection
Derived from EWCLv1 × pLDDT

When EWCLv1 scores are aligned with AlphaFold confidence (pLDDT), EWCL enables detection of physics–confidence conflicts.

A conflict arises when:

  • • EWCLv1 indicates high entropy / low collapse likelihood, and
  • • AlphaFold reports high confidence (high pLDDT)

This pattern highlights regions where a model is confident in a geometry that the sequence context suggests is thermodynamically unstable or ensemble-driven.

Hallucination in EWCL is not a predicted class, but a derived diagnostic computed from the disagreement between:

  • • sequence-derived physics (EWCLv1), and
  • • AI-reported confidence (pLDDT)

This makes hallucination detection transparent, interpretable, and reproducible.

Complementary Model

Unsupervised Latent Model

Blind Sequence Representation

Unsupervised Latent
Label-Free Sequence Analysis

EWCL also includes an unsupervised latent sequence model trained without disorder labels and without DisProt, IDEAL, or curated supervision.

This model learns intrinsic sequence representations directly from amino-acid patterns alone. Its purpose is not to replicate any specific disorder definition, but to provide an orthogonal signal for:

  • • pattern discovery
  • • clustering and representation analysis
  • • hypothesis generation
  • • cross-checking EWCLv1 signals without supervised bias

The latent model should be interpreted as exploratory and complementary, not as a replacement for EWCLv1.

Validation

Validation and Performance

EWCLv1 has been evaluated on large, leakage-free benchmarks spanning millions of residues, including curated disorder datasets and merged references. The model is optimized for:

  • ultra-fast sequence-only analysis
  • continuous outputs suitable for threshold sweeps
  • consistent behavior across sequence and structure inputs

Detailed benchmarking protocols, metrics, and datasets are fully documented and publicly available.

Platform Scope

Platform Scope

EWCL currently supports:

Inputs:

  • • FASTA, UniProt, PDB, mmCIF, AlphaFold PDB

Outputs:

  • • JSON, CSV, annotated PDB

Analysis:

  • • sequence-only collapse likelihood
  • • EWCL vs pLDDT conflict profiling
  • • residue-level entropy and collapse tracks

Structure-aware contextual extensions (geometry-conditioned overlays and advanced diagnostics) are under active development and will be released as premium functionality in future platform updates.

Contributors

Contributors and Scientific Context

Lead Developer
Lucas Cristino

Lucas Cristino is the lead developer and principal architect of EWCL, responsible for model design, data pipelines, benchmarking, and platform deployment. His work focuses on integrating sequence-scale entropy analytics with physics-informed collapse modeling.

Scientific Guidance
Prof. Vladimir N. Uversky

Prof. Vladimir N. Uversky has provided scientific guidance grounded in foundational research on intrinsically disordered proteins, including charge–hydropathy principles and disorder–function relationships that inform EWCL's theoretical framing.

Research Impact

Why EWCL Matters

Sequence-first physics in a structure-first era

Transparent hallucination diagnostics

Continuous, interpretable outputs

Scales from proteomes to single structures

Designed for reproducibility and open science

Ready to Explore EWCL?

Start analyzing your protein sequences and structures with our physics-informed approach to disorder prediction and hallucination detection.