Chimeric Features¶
Computes spectrum match quality features for the runner-up (second-best) peptide prediction, helping to detect chimeric spectra where multiple peptides co-elute.
Purpose¶
Chimeric spectra occur when two or more peptides are fragmented together in the same MS/MS scan. In these cases:
- The top-1 prediction may explain only part of the observed peaks
- The runner-up prediction may also match unexplained peaks
- Both peptides contribute to the total spectrum intensity
By computing spectrum match features for the second-best prediction, the calibrator can identify cases where:
- The top prediction has high confidence but the runner-up also matches well (possible chimera)
- The top prediction poorly explains the spectrum but the runner-up provides a better match
- Competition between candidates suggests uncertainty
Implementation¶
This feature mirrors all computations from FragmentMatchFeatures but applied to the runner-up (second-best) peptide sequence from beam search:
- Extract the second sequence from beam predictions (
dataset.predictions[i][1]) - Call Koina intensity model for the runner-up sequence
- Match theoretical peaks to observed spectrum
- Compute all spectrum match quality features
All column names are prefixed with chimeric_ to distinguish from top-1 features.
Columns¶
All columns from FragmentMatchFeatures with chimeric_ prefix:
Basic Match Metrics¶
| Column | Unit | Description |
|---|---|---|
chimeric_ion_matches |
Fraction (0-1) | Fraction of runner-up theoretical ions matched |
chimeric_ion_match_intensity |
Fraction (0-1) | Observed intensity explained by runner-up |
Ion Coverage Features¶
| Column | Unit | Description |
|---|---|---|
chimeric_longest_b_series |
Count (integer) | Longest consecutive b-ion run for runner-up |
chimeric_longest_y_series |
Count (integer) | Longest consecutive y-ion run for runner-up |
chimeric_complementary_ion_count |
Count (integer) | Bond positions with both b and y ions for runner-up |
chimeric_max_ion_gap |
Daltons (Da) | Largest gap between matched runner-up ions |
chimeric_b_y_intensity_ratio |
Ratio | Ratio of b-ion to y-ion intensity for runner-up (including isotopic envelopes) |
chimeric_spectral_angle |
Score (0-1) | Normalised spectral angle similarity between runner-up theoretical and observed intensities |
chimeric_xcorr |
Score | SEQUEST fast cross-correlation score for the runner-up peptide. Measures overall agreement between the observed spectrum and the runner-up theoretical spectrum with local background correction. See FragmentMatchFeatures — Cross-correlation Score for details on the algorithm. |
from winnow.calibration.features import ChimericFeatures
feature = ChimericFeatures(
mz_tolerance=20,
mz_tolerance_unit="ppm",
unsupported_residues=["N[UNIMOD:7]", "Q[UNIMOD:7]"],
max_precursor_charge=6,
max_peptide_length=30,
model_input_constants={"collision_energies": 25, "fragmentation_types": "HCD"},
learn_from_missing=True,
)
calibrator.add_feature(feature)
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
mz_tolerance |
float |
(required) | Tolerance magnitude for matching fragment ions. |
mz_tolerance_unit |
str |
(required) | Unit for mz_tolerance: "ppm" or "da" (case-insensitive). |
unsupported_residues |
List[str] |
[] |
Residue tokens not supported by the Koina model |
intensity_model_name |
str |
"Prosit_2020_intensity_HCD" |
Name of the Koina intensity model |
max_precursor_charge |
int |
6 |
Maximum charge state supported by the model |
max_peptide_length |
int |
30 |
Maximum peptide length (applied to runner-up) |
model_input_constants |
Dict |
{} |
Constant values for model inputs |
model_input_columns |
Dict |
{} |
Column names for per-row model inputs |
learn_from_missing |
bool |
True |
Whether to impute missing features or filter invalid rows |
Requirements¶
The dataset must have:
- Beam predictions with at least 2 sequences (
dataset.predictions[i]must have length ≥ 2) precursor_charge: Precursor charge statemz_array: Observed m/z valuesintensity_array: Observed intensities
For some Koina-hosted intensity prediction models, the dataset may also require:
collision_energies: Kinetic energy used to fragment the peptidefragmentation_types: Method used to break the ions
Notes¶
- Requires beam predictions; raises
ValueErrorifdataset.predictionsisNone - Spectra with only one beam result (no runner-up) are treated as invalid
- When
learn_from_missing=True, invalid rows get zero feature values and anis_missing_chimeric_featuresindicator column - The runner-up validation constraints (length, charge, residues) are applied to the second-best sequence, not the top-1
- Consider using both
FragmentMatchFeaturesandChimericFeaturestogether to give the calibrator information about both top-1 and runner-up matches chimeric_b_y_intensity_ratiois computed asb_total / (y_total + epsilon)where epsilon is a small constant providing numerical stability when no y-ions are matched.