Token Score Features¶

Extracts position-level confidence metrics from the top beam prediction's token log-probabilities, identifying "weak link" residues within otherwise confident predictions.

Purpose¶

The overall sequence confidence score averages over all token positions, which can mask problems:

A single low-confidence residue in an otherwise confident prediction
High variance suggesting the model is uncertain at multiple positions
Systematic patterns in where uncertainty occurs

By examining token-level scores, the calibrator can identify predictions that look confident overall but contain problematic positions, often indicating errors at specific residues.

Implementation¶

For the top beam prediction, we:

Extract token_log_probabilities from dataset.predictions[i][0]
Convert to probabilities via exp(log_prob)
Compute summary statistics across positions

Columns¶

Column	Unit	Description
`min_token_probability`	Probability (0-1)	Minimum token probability across all positions in the top prediction. Identifies the "weakest link" residue; if this is very low, the prediction may have an error at that position.
`std_token_probability`	Probability (0-1)	Standard deviation of token probabilities. High variance may indicate uncertain positions within an otherwise confident prediction.

Usage¶

from winnow.calibration.features import TokenScoreFeatures

feature = TokenScoreFeatures()
calibrator.add_feature(feature)

Parameters¶

TokenScoreFeatures has no configuration parameters.

Requirements¶

The dataset must have beam predictions where:

dataset.predictions is not None
Each prediction has token_log_probabilities available (list of log-probs per residue position)

This feature raises ValueError if any top prediction is missing token_log_probabilities.

Notes¶

All computations use probabilities (converted from stored log-probabilities via exp())
Single-token sequences have std_token_probability = 0.0 (no variance possible)
Empty sequences return min_token_probability = 0.0
Low min_token_probability combined with high overall confidence is a strong signal of potential error
This feature complements BeamFeatures which looks at sequence-level confidence; TokenScoreFeatures looks at position-level confidence