FDR Control API¶
The winnow.fdr module implements false discovery rate (FDR) estimation and control methods for de novo peptide sequencing using both database-grounded and non-parametric approaches.
Base Interface¶
FDRControl¶
Abstract base class that defines the interface for all FDR control methods in winnow.
from winnow.fdr.base import FDRControl
# All FDR methods inherit from this base class
# and implement the required abstract methods
Core Methods:
fit(dataset): Train the FDR model on a datasetget_confidence_cutoff(threshold): Get confidence cutoff for target FDRcompute_fdr(score): Compute FDR estimate for a confidence scorefilter_entries(dataset, threshold): Filter PSMs at target FDR thresholdadd_psm_fdr(dataset, confidence_col): Add PSM-specific FDR values
Implementations¶
DatabaseGroundedFDRControl¶
Implements database-grounded FDR control using database search results as ground truth for FDR estimation.
from winnow.fdr import DatabaseGroundedFDRControl
from winnow.constants import RESIDUE_MASSES
# Create FDR controller
fdr_control = DatabaseGroundedFDRControl(confidence_feature="confidence")
# Fit using labelled dataset
fdr_control.fit(
dataset=labelled_dataframe,
residue_masses=RESIDUE_MASSES,
isotope_error_range=(0, 1),
drop=10 # Drop top N predictions for stability
)
# Get confidence cutoff for 1% FDR
confidence_cutoff = fdr_control.get_confidence_cutoff(threshold=0.01)
# Filter dataset at target FDR
filtered_data = dataset[dataset["confidence"] >= confidence_cutoff]
# Add PSM-specific FDR values
dataset_with_fdr = fdr_control.add_psm_fdr(dataset, "confidence")
# Add PSM-specific q-values
dataset_with_q_values = fdr_control.add_psm_q_value(dataset, "confidence")
Key Features:
- Ground Truth Validation: Uses database search results for validation
- Precision-Recall Analysis: Computes precision-recall curves from predictions
- Isotope Error Handling: Supports configurable isotope error ranges
- Stability Control: Drop parameter for robust threshold estimation
Required Data:
- Ground truth peptide sequences (
sequencecolumn) - Predicted peptide sequences (
predictioncolumn) - Confidence scores (configurable column name)
NonParametricFDRControl¶
Uses a label-free, non-parametric method for FDR estimation, specifically designed for scenarios where database ground truth is unavailable.
from winnow.fdr import NonParametricFDRControl
# Create non-parametric FDR controller
fdr_control = NonParametricFDRControl()
# Fit estimation method to confidence scores
fdr_control.fit(dataset=dataset["confidence"])
# Get confidence cutoff for 5% FDR
confidence_cutoff = fdr_control.get_confidence_cutoff(threshold=0.05)
# Compute FDR for specific score
fdr_estimate = fdr_control.compute_fdr(score=0.8)
# Compute posterior error probability (local FDR)
pep = fdr_control.compute_posterior_probability(score=0.8)
# Add PSM-specific FDR values
dataset_with_fdr = fdr_control.add_psm_fdr(dataset, "confidence")
# Add PSM-specific q-values
dataset_with_q_values = fdr_control.add_psm_q_value(dataset, "confidence")
Key Features:
- Non-parametric estimation: Estimates FDR directly by assuming PSM confidences are calibrated
- Multiple Metrics: Computes FDR, q-values, posterior error probability
- FDR:
compute_fdr(score)- False discovery rate at cutoff - PEP:
compute_posterior_probability(score)- Posterior error probability - Q-value:
compute_q_value(score)- Minimum FDR for significance
- FDR:
- No Ground Truth Required: Works with confidence scores alone
Additional Features¶
PSM-Specific FDR¶
Both methods support PSM-specific FDR estimation:
# Add FDR values for each PSM
dataset_with_fdr = fdr_control.add_psm_fdr(
dataset_metadata=dataset,
confidence_col="confidence"
)
# Access PSM-specific FDR values
psm_fdr_values = dataset_with_fdr["psm_fdr"]
Q-values¶
Both methods support q-value computation, the minimum FDR threshold at which a given PSM is significant.
# Add q-values for each PSM
dataset_with_q_values = fdr_control.add_psm_q_value(
dataset_metadata=dataset,
confidence_col="confidence"
)
# Access PSM-specific FDR values
psm_q_values = dataset_with_q_values["psm_q_value"]
Confidence Curves¶
Generate FDR vs confidence curves for analysis:
# Get confidence curve
fdr_thresholds, confidence_cutoffs = fdr_control.get_confidence_curve(
resolution=0.01, # FDR resolution
min_confidence=0.01, # Minimum FDR threshold
max_confidence=0.50 # Maximum FDR threshold
)
# Plot or analyse the curve
import matplotlib.pyplot as plt
plt.plot(fdr_thresholds, confidence_cutoffs)
plt.xlabel("FDR Threshold")
plt.ylabel("Confidence Cutoff")
Dataset Filtering¶
Filter PSM datasets at target FDR levels:
from winnow.datasets.psm_dataset import PSMDataset
# Filter PSMDataset at 1% FDR
filtered_psms = fdr_control.filter_entries(
dataset=psm_dataset,
threshold=0.01
)
print(f"Retained {len(filtered_psms)} PSMs at 1% FDR")
FDR Estimation Method Selection¶
Use DatabaseGroundedFDRControl when:
- High-quality database search results available
- Not restricted to de novo sequencing outputs
Use NonParametricFDRControl when:
- No database ground truth available
- Working with de novo sequencing outputs
- Require additional PSM-specific evaluation metrics such as posterior error probabilities
For detailed examples and usage patterns, refer to the examples notebook.