Calibration API¶
The winnow.calibration module implements confidence calibration for peptide-spectrum matches using machine learning-based feature extraction and neural network classification.
Classes¶
ProbabilityCalibrator¶
The main calibration model that transforms raw confidence scores into calibrated probabilities using a multi-layer perceptron classifier with various peptide and spectral features.
from winnow.calibration import ProbabilityCalibrator
from winnow.calibration.calibration_features import (
MassErrorDaFeature, FragmentMatchFeatures, BeamFeatures
)
from winnow.datasets.calibration_dataset import CalibrationDataset
residue_masses = {
"G": 57.021464,
"A": 71.037114,
"P": 97.052764,
"E": 129.042593,
"T": 101.047670,
"I": 113.084064,
"D": 115.026943,
"R": 156.101111,
"O": 237.147727,
"N": 114.042927,
"S": 87.032028,
"M": 131.040485,
"L": 113.084064,
}
# Create and configure calibrator
calibrator = ProbabilityCalibrator(seed=42)
# Add features for calibration
calibrator.add_feature(MassErrorDaFeature(residue_masses=residue_masses))
calibrator.add_feature(FragmentMatchFeatures(mz_tolerance=0.02, mz_tolerance_unit="da"))
calibrator.add_feature(BeamFeatures())
# Train the calibrator
calibrator.fit(training_dataset)
# Make predictions
calibrator.predict(test_dataset)
# Save/load trained models
ProbabilityCalibrator.save(calibrator, Path("calibrator_checkpoint"))
# Load models - supports multiple sources
# 1. Load default pretrained model from Hugging Face
loaded_calibrator = ProbabilityCalibrator.load()
# 2. Load a custom Hugging Face model
loaded_calibrator = ProbabilityCalibrator.load("my-org/my-custom-model")
# 3. Load from local directory
loaded_calibrator = ProbabilityCalibrator.load("calibrator_checkpoint")
Key Features:
- Neural Network Classifier: Uses MLPClassifier with standardised feature scaling
- Feature Management: Add, remove and track multiple calibration features
- Dependency Handling: Automatic computation of feature dependencies
- Model Persistence: Save and load trained calibrators
- Feature Extraction: Computes features and handles both labelled and unlabelled data
Main Methods:
add_feature(feature): Add a calibration featurefit(dataset): Train the calibrator on a labelled datasetpredict(dataset): Generate calibrated confidence scoressave(calibrator, path): Save trained model to disk-
load(pretrained_model_name_or_path, cache_dir): Load trained model from Hugging Face Hub or local directory- Default: Loads
"InstaDeepAI/winnow-general-model"from Hugging Face - Hugging Face: Pass a repository ID string (e.g.,
"my-org/my-model") - Local: Pass a
strorPathobject pointing to a model directory - Models from Hugging Face are automatically cached in
~/.cache/huggingface/hub
- Default: Loads
Calibration Features¶
The calibrator uses a feature-based approach where multiple feature extractors compute signals from the peptide-spectrum match data. See the Calibration Features documentation for:
- The
CalibrationFeaturesbase class for creating custom features - Built-in features: Mass Error Features, Beam Features, Fragment Match Features, Chimeric Features, Retention Time Feature, Sequence Features, Token Score Features
- Feature dependencies and how they work
- Handling missing features (learn vs filter strategies)
Workflow¶
Training workflow¶
- Create Calibrator: Initialise
ProbabilityCalibrator - Add Features: Use
add_feature()to include desired calibration features - Fit Model: Call
fit()with labelledCalibrationDataset - Save Model: Use
save()to persist trained calibrator
Prediction workflow¶
- Load Calibrator: Use
load()to restore trained model from a Hugging Face repository or a local directory - Predict: Call
predict()with unlabelledCalibrationDataset - Access Results: Calibrated scores stored in dataset's "calibrated_confidence" column
For detailed examples and usage patterns, refer to the examples notebook.