E
EWCL
EWCL v1

Model Repository

Frozen publication models for protein disorder and collapse propensity prediction. Download pre-built archives, run local inference, and integrate with your own pipelines.

3 models · LightGBM·Code: MIT · Weights: Non-commercial·Python ≥ 3.9
License note — Source code is MIT-licensed. Model weights and zip archives (dist/*.zip) are research / non-commercial only. WEIGHTS_LICENSE.md · Commercial enquiries

Available Models

EWCL-SequenceSequence-onlyv1.0.0249 featuresLightGBM

Sequence-only disorder and collapse propensity prediction. No structure required — uses frozen positional and physicochemical features derived from the amino acid sequence.

No PDB availableLarge-scale proteome scansFast screening
Download .zipEWCL-Sequence_v1.0.0.zip
EWCL-DisorderPositional-contextv1.0.0239 featuresLightGBM

Positional-context disorder predictor. Integrates local neighborhood features alongside sequence composition for improved disorder boundary detection.

Disorder boundary mappingIDR annotationSegment-level analysis
Download .zipEWCL-Disorder_v1.0.0.zip
EWCL-StructureStructure-awarev1.0.0230 featuresLightGBM

Structure-aware predictor incorporating per-residue pLDDT from AlphaFold. Enables EWCL–pLDDT disagreement analysis via the EDI diagnostic.

AlphaFold integrationEDI / conflict analysisStructure-conditioned scoring
Download .zipEWCL-Structure_v1.0.0.zip

Quickstart

1 — Install the package

bash
# Base install (sequence + disorder models)
pip install -e .

# With structure parsing support
pip install -e ".[structure]"

2 — Download and extract a model archive

bash
# Download from GitHub (or use the buttons above)
wget https://github.com/EWCL-x/ewcl-models.v1/raw/main/dist/EWCL-Sequence_v1.0.0.zip

# Extract to a local directory — always extract before use
unzip EWCL-Sequence_v1.0.0.zip -d ~/ewcl_models/EWCL-Sequence_v1.0.0

load_model() requires the extracted directory path. Passing the .zip path directly will raise an error.

3 — Run inference (Python API)

python
from ewcl_models.loaders import load_model
from ewcl_models.feature_extractors import build_sequence_features
from ewcl_models.predictors import predict_from_features

# Load from the extracted directory
model = load_model("~/ewcl_models/EWCL-Sequence_v1.0.0")

# Build features from a sequence string
fb = build_sequence_features("MKFLILLFNILCLFPVLAADNHGVS...")

# Predict
result = predict_from_features(fb.all_df, model)
print(result[["residue_index", "aa", "p_raw", "p"]])

4 — Command-line interface

bash
# Sequence-only prediction
ewcl-predict \
  --model ~/ewcl_models/EWCL-Sequence_v1.0.0 \
  --fasta examples/example.fasta \
  --out results.csv

# Structure-aware prediction (requires AlphaFold PDB)
ewcl-predict \
  --model ~/ewcl_models/EWCL-Structure_v1.0.0 \
  --fasta examples/example.fasta \
  --pdb  examples/example.pdb \
  --out  results.csv

# Parquet output
ewcl-predict --model ~/ewcl_models/EWCL-Disorder_v1.0.0 \
  --fasta examples/example.fasta --out results.parquet --format parquet

Structure-Aware Prediction & EDI Diagnostic

EWCL-Structure accepts per-residue pLDDT values from an AlphaFold PDB file and computes the EWCL–pLDDT Disagreement Index (EDI) — a per-residue diagnostic that quantifies local discordance between predicted disorder and AlphaFold confidence. EDI segments flag structurally ambiguous regions not captured by pLDDT alone.

python
from ewcl_models.loaders import load_model
from ewcl_models.feature_extractors import build_sequence_features, compute_structure_features
from ewcl_models.diagnostics import compute_edi, compute_cds, edi_segments
import pandas as pd

model = load_model("~/ewcl_models/EWCL-Structure_v1.0.0")

seq     = "MKFLILLFNILCLFPVLAADNHGVS..."
plddt   = [95.2, 93.1, 88.4, ...]   # per-residue pLDDT from AlphaFold PDB

fb          = build_sequence_features(seq)
struct_df   = compute_structure_features(seq, plddt_vals=plddt)
combined    = pd.concat([fb.base_df.reset_index(drop=True),
                          struct_df.reset_index(drop=True)], axis=1)

result   = predict_from_features(combined, model)

# EDI / conflict diagnostics
edi      = compute_edi(result["p"].values, plddt)
cds      = compute_cds(result["p"].values, plddt)
segments = edi_segments(edi, threshold=0.2, min_length=5)
print(segments)

Parsers & Feature Extractors

All feature extractors are frozen at training time and distributed as part of the repository. They are implemented in pure Python with no additional model dependencies.

Model Bundle Contents

Each .zip archive is self-contained. After extraction, the directory structure is:

bash
EWCL-{Model}_v1.0.0/
├── model/
│   └── model.txt               # LightGBM booster (frozen weights)
├── contract/
│   ├── feature_list.json       # Exact ordered feature list
│   ├── inference_contract.json # Input/output schema
│   └── schema_rules.json       # Validation rules
├── calibration/
│   └── calibration.json        # Platt / isotonic calibration
├── provenance/
│   ├── versions.json           # Package & dependency versions at train time
│   ├── data_manifest.json      # Training data checksums
│   └── training_meta.json      # Hyperparameters and split info
└── docs/
    └── README_MODEL.md         # Model-specific documentation

Requirements

Core

  • Python ≥ 3.9
  • numpy
  • pandas
  • lightgbm
  • scikit-learn
  • joblib

Optional — [structure]

  • gemmi (preferred) or biopython
  • pyarrow (Parquet output)

Install with pip install -e ".[structure]"

Bulk Predictions

Pre-computed predictions — coming soon

Pre-computed per-residue EWCL scores for the full Swiss-Prot reviewed proteome will be available here as downloadable Parquet archives, enabling direct integration without running local inference. Sign up for updates below.

In preparationContact us

Citation

If you use EWCL models in your research, please cite:

bibtex
@unpublished{CristinoUversky_EWCL_2026,
  author = {Cristino, Lucas and Uversky, Vladimir N.},
  title  = {Entropy-Weighted Collapse Likelihood (EWCL): sequence- and
            structure-conditioned predictors of intrinsic disorder
            and collapse propensity},
  note   = {Manuscript in preparation (preprint/journal submission forthcoming)},
  year   = {2026}
}

Machine-readable metadata available in CITATION.cff in the repository.