E
EWCL

EWCL · Publication Models · 2026-04-22

Benchmark Summary

Performance of EWCL-Consensus publication models on held-out consensus test and independent DisProt sacred test. All p < 1e-300.

DatasetProteinsResiduesw/ StructureResidues w/ Structure
Consensus test4,508895,8464,452699,568
DisProt sacred2,007979,6061,895854,638

Models

EWCL-Consensus

Structure · 146 features

Publication model

Sequence + 6 structural features (pLDDT, depth, contacts, …)

Train proteins

~22 k

Test residues — consensus

895,846

Test proteins — consensus

4,508

Test residues — DisProt

979,606

Test proteins — DisProt

2,007

EWCL-Consensus-FASTA

Sequence-only · 139 features

Sequence model

Same architecture, zero structural inputs — pure sequence signal

Train proteins

~22 k

Test residues — consensus

895,846

Test proteins — consensus

4,508

Test residues — DisProt

979,606

Test proteins — DisProt

2,007

§1 · pLDDT Correlation — Residue-Level

Structure-available residues only. High pLDDT (ordered) → low EWCL.

DatasetModelProteinsResiduesPearson rSpearman ρ
Consensus testEWCL-Consensus (struct)4,452699,568−0.8217−0.8042
Consensus testEWCL-Consensus-FASTA4,452699,568−0.6781−0.6376
DisProt sacredEWCL-Consensus (struct)1,895854,638−0.6963−0.7998
DisProt sacredEWCL-Consensus-FASTA1,895854,638−0.5633−0.5855

§2 · Model Agreement — Struct vs FASTA

All residues. r > 0.89 — structure adds incremental signal.

DatasetProteinsResiduesPearson rSpearman ρ
Consensus test4,508895,8460.90110.8814
DisProt sacred2,007979,6060.89290.8372

§4 · Per-Protein Pearson r

Min 10 residues/protein. Median stronger than residue-level.

SetComparisonMedian rMeanStd
Consensus (4,452)pLDDT vs EWCL-Struct−0.872−0.8130.192
pLDDT vs EWCL-FASTA−0.671−0.5810.337
EWCL-Struct vs EWCL-FASTA+0.834+0.764
DisProt (1,895)pLDDT vs EWCL-Struct−0.897−0.8330.187
pLDDT vs EWCL-FASTA−0.708−0.6430.251
EWCL-Struct vs EWCL-FASTA+0.871+0.820

§3 · Point-Biserial Correlation vs Binary Disorder GT

EWCL rows use all residues; pLDDT restricted to structure-available. DisProt: EWCL r = +0.46 vs pLDDT −0.24 — nearly double.

DatasetPredictorProteinsResiduesPoint-biserial r
Consensus testEWCL-Consensus (struct)4,508895,846+0.7902
Consensus testEWCL-Consensus-FASTA4,508895,846+0.7224
Consensus testpLDDT baseline4,452699,568−0.6549
DisProt sacredEWCL-Consensus (struct)2,007979,606+0.4555
DisProt sacredEWCL-Consensus-FASTA2,007979,606+0.4113
DisProt sacredpLDDT baseline1,895854,638−0.2394

Key Findings

pLDDT and EWCL strongly anti-correlated

Pearson r = −0.56 to −0.82 across datasets/models

Structure variant correlates more with pLDDT than FASTA

Δr ≈ 0.13–0.15 on consensus test

FASTA pLDDT correlation is sequence-emergent

r = −0.56 to −0.67 using 0 structure features

Both EWCL variants far outperform raw pLDDT

Point-biserial r: EWCL +0.46–0.79 vs pLDDT −0.24 to −0.65

Structure and FASTA variants agree strongly

r > 0.89 — structure adds incremental, not contradictory, signal

Per-protein correlations stronger than residue-level

Median per-protein r = −0.87 to −0.90 (pLDDT vs struct model)

EWCL · Sub-sampled Lite Models · 2026-04-22

Lite Model Benchmarks

EWCL-Lite1500 (1,500 training proteins) and EWCL-Lite2000 (2,000 training proteins) evaluated on CAID and a 909-protein DisProt held-out set. Neither model saw the held-out split during training.

CAID Benchmark — AUROC (5 Tasks)

Lite2000 leads on 4 of 5 tasks; Lite1500 leads on Binding IDR. NOX: 204 proteins · PDB: 319 · Binding: 52 · Binding IDR: 52 · Linker: 31.

ModelDisorder NOXDisorder PDBBindingBinding IDRLinker
EWCL-Lite20000.9420.9510.8810.6230.936
EWCL-Lite15000.9050.9200.8530.6540.900

DisProt Held-Out — 909 Shared Proteins vs External Baselines

All three EWCL models outperform all external predictors on AUROC. Robust-Structure leads AUROC + F1; Lite1500 leads AUPRC + MCC.

ModelHeld-Out ProteinsAUROCAUPRCF1 @0.5MCC @0.5
EWCL-Robust-Structure9090.81180.47590.49360.3845
EWCL-Lite15009090.80440.48990.48930.3863
EWCL-Lite20009090.80020.48180.47750.3706
DisoFlag9090.78790.45410.45420.3510
ADOPT9090.74800.33960.47010.3430
aiUpred9090.73580.31980.42220.2903
IUPred39090.72790.31620.40650.2663
MetaPredict9090.72340.26670.42750.2982
Inv-pLDDT9090.72200.26920.37260.2200