EWCL Performance Across Disorder Prediction Benchmarks

The Entropy-Weighted Collapse Likelihood (EWCL) model achieves state-of-the-art performance across curated MobiDB and CAID disorder benchmarks. Independent validations show mean AUROC above 0.90, highlighting EWCL's reliability in quantifying disorder, linkers, and binding-prone residues directly from sequence-derived entropy without training bias.

Stable
Ordered
Moderate
Flexible
Collapse-prone
Sequence-Only Performance

MobiDB Sequence-Only Benchmarks

EWCL highlights canonical IDR biology using only sequence-derived entropy. We prioritize proteins with multiple annotation modalities (disorder, binding, low-complexity, motifs) to show annotation orthogonality and boundary fidelity.

DisProt (curated)

Training
Proteins:3,121
ROC–AUC:0.928
PR–AUC:0.803
Notes:Training reference

IDEAL

External
Proteins:731
ROC–AUC:0.724
PR–AUC:0.327
Notes:Low-prevalence

IDEAL excl. DisProt

External
Proteins:731
ROC–AUC:0.724
PR–AUC:0.327
Notes:Zero overlap

MobiDB

Functional
Proteins:1,909
Residues:~1.29M
ROC–AUC:0.774
PR–AUC:0.405
Notes:~3.16× AP lift vs random
Training datasets
External validation
Functional contexts
MobiDB Signature Cases

Highlighted Case Studies

Flagship proteins from cancer biology (p53, BRCA1), molecular chaperones (HSP70), calcium signaling (Calmodulin), and bacterial regulation (SilE) — each demonstrating EWCL's precision across diverse functional contexts.

Tumor protein p53

P04637MobiDB / DisProt

p53 tumor suppressor protein with structured DNA-binding domain and intrinsically disordered transactivation domains. EWCL precisely maps the flexible N- and C-terminal regions critical for transcriptional regulation and protein interactions.

AUROC
0.834
PR-AUC
0.700
IDR %
42.0%
Ground Truth Source

UniProt Feature Summary

Pfam domains: P53 DNA-binding domain (94-312), P53 tetramerization motif (324-355)
Motifs: Bipartite nuclear localization signal (305-322), MDM2-binding motif (18-26)
Disordered: N-terminal transactivation domain (1-93), C-terminal regulatory region (356-393)
Binding: DNA major groove contacts (Arg248, Arg273), MDM2 interaction surface
Experimental: N-terminus (1-93) and C-terminus (356-393) confirmed by NMR

UniProt Features & EWCL Heatmap

Pfam / domain • Motif / ELM • Disordered • Binding • Low complexity • Secondary structure • Natural variant • Experimental disorder

Loading heatmap...
Figure. p53 (EWCL = 0.91 max, ROC = 0.834) exemplifies EWCL's precision in identifying functionally critical disordered regions in tumor suppressors, mapping both transactivation domains and the flexible C-terminus.

BRCA1 C-terminal domain

P38398MobiDB / DisProt

BRCA1 breast cancer susceptibility protein containing structured RING and BRCT domains connected by intrinsically disordered linker regions. EWCL identifies the flexible regions involved in protein-protein interactions and DNA damage response.

AUROC
1.000
PR-AUC
0.950
IDR %
31.0%
Ground Truth Source

UniProt Feature Summary

Pfam domains: RING finger domain (24-64), BRCT domain 1 (1650-1736), BRCT domain 2 (1760-1855)
Motifs: Nuclear localization signals (503-508, 607-614), PEST sequences
Disordered: Central region (300-1650) with multiple interaction motifs
Binding: E2 ubiquitin conjugase binding (RING), DNA binding (BRCT)
Experimental: Central region flexibility confirmed by SAXS and hydrogen exchange

UniProt Features & EWCL Heatmap

Pfam / domain • Motif / ELM • Disordered • Binding • Low complexity • Secondary structure • Natural variant • Experimental disorder

Loading heatmap...
Figure. BRCA1 (EWCL = 0.89 max, ROC = 1.000) demonstrates EWCL's ability to map disordered linkers in multi-domain cancer-associated proteins, highlighting regions critical for DNA repair complexes.

RNA polymerase II subunit A

P35637MobiDB / DisProt

Large RNA polymerase II subunit with structured catalytic domains and intrinsically disordered C-terminal domain (CTD). EWCL accurately maps the flexible CTD critical for transcriptional regulation.

AUROC
0.947
PR-AUC
0.824
IDR %
28.5%
Ground Truth Source

UniProt Feature Summary

Pfam domains: RNA polymerase Rpb1 domains (catalytic core)
Disordered: C-terminal domain (CTD) with heptapeptide repeats (400-526)
Binding: DNA-directed RNA polymerase activity, nucleotide binding
Experimental: CTD serves as flexible scaffold for transcription factor recruitment

UniProt Features & EWCL Heatmap

Pfam / domain • Motif / ELM • Disordered • Binding • Low complexity • Secondary structure • Natural variant • Experimental disorder

Loading heatmap...
Figure. RNA Pol II (EWCL = 0.87 max, ROC = 0.947) showcases EWCL's precision in identifying the disordered C-terminal domain, a key regulatory region in transcription machinery.

Phosphoprotein p53

P16220MobiDB / DisProt

Another p53 family member with similar disordered transactivation and regulatory domains. EWCL demonstrates exceptional performance on this highly disordered tumor suppressor variant.

AUROC
0.999
PR-AUC
0.982
IDR %
48.0%
Ground Truth Source

UniProt Feature Summary

Pfam domains: Bromodomain, TAZ zinc finger, KIX domain, PHD finger
Disordered: Multiple IDRs including N-terminal (1-300), central linker regions
Binding: Histone acetyltransferase, chromatin remodeling, transcription factor binding
Experimental: IDRs critical for transcriptional activation and protein-protein interactions

UniProt Features & EWCL Heatmap

Pfam / domain • Motif / ELM • Disordered • Binding • Low complexity • Secondary structure • Natural variant • Experimental disorder

Loading heatmap...
Figure. p53 variant (EWCL = 0.93 max, ROC = 0.999) shows near-perfect EWCL performance on disordered tumor suppressor domains, confirming the method's reliability across p53 family proteins.

Transcription elongation factor B

Q13148MobiDB / DisProt

Transcription elongation factor with structured domains and flexible regulatory regions. EWCL identifies the disordered regions involved in transcriptional control and protein-protein interactions.

AUROC
0.990
PR-AUC
0.921
IDR %
35.0%
Ground Truth Source

UniProt Feature Summary

Pfam domains: TFIIS N-terminal domain, TFIIS central domain, TFIIS C-terminal domain
Disordered: N-terminal region (1-50)
Binding: Zinc-finger, RNA polymerase II binding
Experimental: N-terminal IDR involved in transcriptional regulation

UniProt Features & EWCL Heatmap

Pfam / domain • Motif / ELM • Disordered • Binding • Low complexity • Secondary structure • Natural variant • Experimental disorder

Loading heatmap...
Figure. TEFB (EWCL = 0.86 max, ROC = 0.990) demonstrates EWCL's accuracy on transcriptional machinery, precisely mapping flexible regulatory domains critical for gene expression control.