Model Repository
Frozen publication models for protein disorder and collapse propensity prediction. Download pre-built archives, run local inference, and integrate with your own pipelines.
dist/*.zip) are research / non-commercial only. WEIGHTS_LICENSE.md · Commercial enquiriesAvailable Models
Sequence-only disorder and collapse propensity prediction. No structure required — uses frozen positional and physicochemical features derived from the amino acid sequence.
Positional-context disorder predictor. Integrates local neighborhood features alongside sequence composition for improved disorder boundary detection.
Structure-aware predictor incorporating per-residue pLDDT from AlphaFold. Enables EWCL–pLDDT disagreement analysis via the EDI diagnostic.
Quickstart
1 — Install the package
# Base install (sequence + disorder models) pip install -e . # With structure parsing support pip install -e ".[structure]"
2 — Download and extract a model archive
# Download from GitHub (or use the buttons above) wget https://github.com/EWCL-x/ewcl-models.v1/raw/main/dist/EWCL-Sequence_v1.0.0.zip # Extract to a local directory — always extract before use unzip EWCL-Sequence_v1.0.0.zip -d ~/ewcl_models/EWCL-Sequence_v1.0.0
load_model() requires the extracted directory path. Passing the .zip path directly will raise an error.
3 — Run inference (Python API)
from ewcl_models.loaders import load_model
from ewcl_models.feature_extractors import build_sequence_features
from ewcl_models.predictors import predict_from_features
# Load from the extracted directory
model = load_model("~/ewcl_models/EWCL-Sequence_v1.0.0")
# Build features from a sequence string
fb = build_sequence_features("MKFLILLFNILCLFPVLAADNHGVS...")
# Predict
result = predict_from_features(fb.all_df, model)
print(result[["residue_index", "aa", "p_raw", "p"]])4 — Command-line interface
# Sequence-only prediction ewcl-predict \ --model ~/ewcl_models/EWCL-Sequence_v1.0.0 \ --fasta examples/example.fasta \ --out results.csv # Structure-aware prediction (requires AlphaFold PDB) ewcl-predict \ --model ~/ewcl_models/EWCL-Structure_v1.0.0 \ --fasta examples/example.fasta \ --pdb examples/example.pdb \ --out results.csv # Parquet output ewcl-predict --model ~/ewcl_models/EWCL-Disorder_v1.0.0 \ --fasta examples/example.fasta --out results.parquet --format parquet
Structure-Aware Prediction & EDI Diagnostic
EWCL-Structure accepts per-residue pLDDT values from an AlphaFold PDB file and computes the EWCL–pLDDT Disagreement Index (EDI) — a per-residue diagnostic that quantifies local discordance between predicted disorder and AlphaFold confidence. EDI segments flag structurally ambiguous regions not captured by pLDDT alone.
from ewcl_models.loaders import load_model
from ewcl_models.feature_extractors import build_sequence_features, compute_structure_features
from ewcl_models.diagnostics import compute_edi, compute_cds, edi_segments
import pandas as pd
model = load_model("~/ewcl_models/EWCL-Structure_v1.0.0")
seq = "MKFLILLFNILCLFPVLAADNHGVS..."
plddt = [95.2, 93.1, 88.4, ...] # per-residue pLDDT from AlphaFold PDB
fb = build_sequence_features(seq)
struct_df = compute_structure_features(seq, plddt_vals=plddt)
combined = pd.concat([fb.base_df.reset_index(drop=True),
struct_df.reset_index(drop=True)], axis=1)
result = predict_from_features(combined, model)
# EDI / conflict diagnostics
edi = compute_edi(result["p"].values, plddt)
cds = compute_cds(result["p"].values, plddt)
segments = edi_segments(edi, threshold=0.2, min_length=5)
print(segments)Parsers & Feature Extractors
All feature extractors are frozen at training time and distributed as part of the repository. They are implemented in pure Python with no additional model dependencies.
sequence_featuresFrozen sequence feature extractor — positional, physicochemical, and window-based descriptors.
structure_featuresFrozen structure feature extractor — pLDDT normalization, contact-density proxies, secondary structure context.
loadersload_model(dir) → LoadedModel. Validates feature contract and calibration against the extracted model directory.
diagnosticscompute_edi(), compute_cds(), edi_segments() — EWCL–pLDDT disagreement diagnostics.
ioFASTA/PDB parsing, CSV and Parquet output writers.
predictorspredict_from_features() — inference pipeline with calibration applied.
Model Bundle Contents
Each .zip archive is self-contained. After extraction, the directory structure is:
EWCL-{Model}_v1.0.0/
├── model/
│ └── model.txt # LightGBM booster (frozen weights)
├── contract/
│ ├── feature_list.json # Exact ordered feature list
│ ├── inference_contract.json # Input/output schema
│ └── schema_rules.json # Validation rules
├── calibration/
│ └── calibration.json # Platt / isotonic calibration
├── provenance/
│ ├── versions.json # Package & dependency versions at train time
│ ├── data_manifest.json # Training data checksums
│ └── training_meta.json # Hyperparameters and split info
└── docs/
└── README_MODEL.md # Model-specific documentationRequirements
Core
Python ≥ 3.9numpypandaslightgbmscikit-learnjoblib
Optional — [structure]
gemmi (preferred) or biopythonpyarrow (Parquet output)
Install with pip install -e ".[structure]"
Bulk Predictions
Pre-computed predictions — coming soon
Pre-computed per-residue EWCL scores for the full Swiss-Prot reviewed proteome will be available here as downloadable Parquet archives, enabling direct integration without running local inference. Sign up for updates below.
Citation
If you use EWCL models in your research, please cite:
@unpublished{CristinoUversky_EWCL_2026,
author = {Cristino, Lucas and Uversky, Vladimir N.},
title = {Entropy-Weighted Collapse Likelihood (EWCL): sequence- and
structure-conditioned predictors of intrinsic disorder
and collapse propensity},
note = {Manuscript in preparation (preprint/journal submission forthcoming)},
year = {2026}
}Machine-readable metadata available in CITATION.cff in the repository.