EWCL · Publication Models · 2026-04-22

Benchmark Summary

Performance of EWCL-Consensus publication models on held-out consensus test and independent DisProt sacred test. All p < 1e-300.

Dataset	Proteins	Residues	w/ Structure	Residues w/ Structure
Consensus test	4,508	895,846	4,452	699,568
DisProt sacred	2,007	979,606	1,895	854,638

Models

EWCL-Consensus

Structure · 146 features

Publication model

Sequence + 6 structural features (pLDDT, depth, contacts, …)

Train proteins

~22 k

Test residues — consensus

895,846

Test proteins — consensus

4,508

Test residues — DisProt

979,606

Test proteins — DisProt

2,007

EWCL-Consensus-FASTA

Sequence-only · 139 features

Sequence model

Same architecture, zero structural inputs — pure sequence signal

Train proteins

~22 k

Test residues — consensus

895,846

Test proteins — consensus

4,508

Test residues — DisProt

979,606

Test proteins — DisProt

2,007

§1 · pLDDT Correlation — Residue-Level

Structure-available residues only. High pLDDT (ordered) → low EWCL.

Dataset	Model	Proteins	Residues	Pearson r	Spearman ρ
Consensus test	EWCL-Consensus (struct)	4,452	699,568	−0.8217	−0.8042
Consensus test	EWCL-Consensus-FASTA	4,452	699,568	−0.6781	−0.6376
DisProt sacred	EWCL-Consensus (struct)	1,895	854,638	−0.6963	−0.7998
DisProt sacred	EWCL-Consensus-FASTA	1,895	854,638	−0.5633	−0.5855

§2 · Model Agreement — Struct vs FASTA

All residues. r > 0.89 — structure adds incremental signal.

Dataset	Proteins	Residues	Pearson r	Spearman ρ
Consensus test	4,508	895,846	0.9011	0.8814
DisProt sacred	2,007	979,606	0.8929	0.8372

§4 · Per-Protein Pearson r

Min 10 residues/protein. Median stronger than residue-level.

Set	Comparison	Median r	Mean	Std
Consensus (4,452)	pLDDT vs EWCL-Struct	−0.872	−0.813	0.192
	pLDDT vs EWCL-FASTA	−0.671	−0.581	0.337
	EWCL-Struct vs EWCL-FASTA	+0.834	+0.764	—
DisProt (1,895)	pLDDT vs EWCL-Struct	−0.897	−0.833	0.187
	pLDDT vs EWCL-FASTA	−0.708	−0.643	0.251
	EWCL-Struct vs EWCL-FASTA	+0.871	+0.820	—

§3 · Point-Biserial Correlation vs Binary Disorder GT

EWCL rows use all residues; pLDDT restricted to structure-available. DisProt: EWCL r = +0.46 vs pLDDT −0.24 — nearly double.

Dataset	Predictor	Proteins	Residues	Point-biserial r
Consensus test	EWCL-Consensus (struct)	4,508	895,846	+0.7902
Consensus test	EWCL-Consensus-FASTA	4,508	895,846	+0.7224
Consensus test	pLDDT baseline	4,452	699,568	−0.6549
DisProt sacred	EWCL-Consensus (struct)	2,007	979,606	+0.4555
DisProt sacred	EWCL-Consensus-FASTA	2,007	979,606	+0.4113
DisProt sacred	pLDDT baseline	1,895	854,638	−0.2394

Key Findings

pLDDT and EWCL strongly anti-correlated

Pearson r = −0.56 to −0.82 across datasets/models

Structure variant correlates more with pLDDT than FASTA

Δr ≈ 0.13–0.15 on consensus test

FASTA pLDDT correlation is sequence-emergent

r = −0.56 to −0.67 using 0 structure features

Both EWCL variants far outperform raw pLDDT

Point-biserial r: EWCL +0.46–0.79 vs pLDDT −0.24 to −0.65

Structure and FASTA variants agree strongly

r > 0.89 — structure adds incremental, not contradictory, signal

Per-protein correlations stronger than residue-level

Median per-protein r = −0.87 to −0.90 (pLDDT vs struct model)

EWCL · Sub-sampled Lite Models · 2026-04-22

Lite Model Benchmarks

EWCL-Lite1500 (1,500 training proteins) and EWCL-Lite2000 (2,000 training proteins) evaluated on CAID and a 909-protein DisProt held-out set. Neither model saw the held-out split during training.

CAID Benchmark — AUROC (5 Tasks)

Lite2000 leads on 4 of 5 tasks; Lite1500 leads on Binding IDR. NOX: 204 proteins · PDB: 319 · Binding: 52 · Binding IDR: 52 · Linker: 31.

Model	Disorder NOX	Disorder PDB	Binding	Binding IDR	Linker
EWCL-Lite2000	0.942	0.951	0.881	0.623	0.936
EWCL-Lite1500	0.905	0.920	0.853	0.654	0.900

DisProt Held-Out — 909 Shared Proteins vs External Baselines

All three EWCL models outperform all external predictors on AUROC. Robust-Structure leads AUROC + F1; Lite1500 leads AUPRC + MCC.

Model	Held-Out Proteins	AUROC	AUPRC	F1 @0.5	MCC @0.5
EWCL-Robust-Structure	909	0.8118	0.4759	0.4936	0.3845
EWCL-Lite1500	909	0.8044	0.4899	0.4893	0.3863
EWCL-Lite2000	909	0.8002	0.4818	0.4775	0.3706
DisoFlag	909	0.7879	0.4541	0.4542	0.3510
ADOPT	909	0.7480	0.3396	0.4701	0.3430
aiUpred	909	0.7358	0.3198	0.4222	0.2903
IUPred3	909	0.7279	0.3162	0.4065	0.2663
MetaPredict	909	0.7234	0.2667	0.4275	0.2982
Inv-pLDDT	909	0.7220	0.2692	0.3726	0.2200