Token Score Features¶
Extracts position-level confidence metrics from the top beam prediction's token log-probabilities, identifying "weak link" residues within otherwise confident predictions.
Purpose¶
The overall sequence confidence score averages over all token positions, which can mask problems:
- A single low-confidence residue in an otherwise confident prediction
- High variance suggesting the model is uncertain at multiple positions
- Systematic patterns in where uncertainty occurs
By examining token-level scores, the calibrator can identify predictions that look confident overall but contain problematic positions, often indicating errors at specific residues.
Implementation¶
For the top beam prediction, we:
- Extract
token_log_probabilitiesfromdataset.predictions[i][0] - Convert to probabilities via
exp(log_prob) - Compute summary statistics across positions
Columns¶
| Column | Unit | Description |
|---|---|---|
min_token_probability |
Probability (0-1) | Minimum token probability across all positions in the top prediction. Identifies the "weakest link" residue; if this is very low, the prediction may have an error at that position. |
std_token_probability |
Probability (0-1) | Standard deviation of token probabilities. High variance may indicate uncertain positions within an otherwise confident prediction. |
Usage¶
from winnow.calibration.features import TokenScoreFeatures
feature = TokenScoreFeatures()
calibrator.add_feature(feature)
Parameters¶
TokenScoreFeatures has no configuration parameters.
Requirements¶
The dataset must have beam predictions where:
dataset.predictionsis notNone- Each prediction has
token_log_probabilitiesavailable (list of log-probs per residue position)
This feature raises ValueError if any top prediction is missing token_log_probabilities.
Notes¶
- All computations use probabilities (converted from stored log-probabilities via
exp()) - Single-token sequences have
std_token_probability = 0.0(no variance possible) - Empty sequences return
min_token_probability = 0.0 - Low
min_token_probabilitycombined with high overall confidence is a strong signal of potential error - This feature complements
BeamFeatureswhich looks at sequence-level confidence;TokenScoreFeatureslooks at position-level confidence