Skip to content

Knapsack beam search

logger = ColorLog(console, __name__).logger module-attribute

KnapsackBeamSearchDecoder(model: Decodable, knapsack: Knapsack, suppressed_residues: list[str] | None = None, disable_terminal_residues_anywhere: bool = True, keep_invalid_mass_sequences: bool = True, float_dtype: torch.dtype = torch.float64)

Bases: Decoder

A class for decoding from de novo sequence models using beam search.

This class conforms to the Decoder interface and decodes from models that conform to the Decodable interface.

knapsack = knapsack instance-attribute

chart = torch.tensor(self.knapsack.chart) instance-attribute

mass_scale = knapsack.mass_scale instance-attribute

disable_terminal_residues_anywhere = disable_terminal_residues_anywhere instance-attribute

keep_invalid_mass_sequences = keep_invalid_mass_sequences instance-attribute

float_dtype = float_dtype instance-attribute

residue_masses = torch.zeros((len(self.model.residue_set),), dtype=(self.float_dtype)) instance-attribute

terminal_residue_indices = torch.tensor(terminal_residues_idx, dtype=(torch.long)) instance-attribute

suppressed_residue_indices = torch.tensor(suppressed_residues_idx, dtype=(torch.long)) instance-attribute

residue_target_offsets = torch.tensor(residue_target_offsets, dtype=(self.float_dtype)) instance-attribute

vocab_size = len(self.model.residue_set) instance-attribute

from_file(model: Decodable, path: str, float_dtype: torch.dtype = torch.float64) -> KnapsackBeamSearchDecoder classmethod

Initialize a decoder by loading a saved knapsack.

PARAMETER DESCRIPTION
model

The model to be decoded from.

TYPE: Decodable

path

The path to the directory where the knapsack was saved to.

TYPE: str

float_dtype

The floating point dtype to use.

TYPE: dtype DEFAULT: float64

RETURNS DESCRIPTION
KnapsackBeamSearchDecoder

The decoder.

TYPE: KnapsackBeamSearchDecoder

decode(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], beam_size: int, max_length: int, mass_tolerance: float = 5e-05, max_isotope: int = 1, min_log_prob: float = -float('inf'), return_encoder_output: bool = False, encoder_output_reduction: Literal['mean', 'max', 'sum', 'full'] = 'mean', return_beam: bool = False, **kwargs) -> dict[str, Any]

Decode predicted residue sequence for a batch of spectra using beam search.

PARAMETER DESCRIPTION
spectra

The spectra to be sequenced.

TYPE: FloatTensor

precursors

The precursor mass, charge and mass-to-charge ratio.

TYPE: torch.FloatTensor[batch size, 3]

beam_size

The maximum size of the beam. Ignored in beam search.

TYPE: int

max_length

The maximum length of a residue sequence.

TYPE: int

mass_tolerance

The maximum relative error for which a predicted sequence is still considered to have matched the precursor mass.

TYPE: float DEFAULT: 5e-05

max_isotope

The maximum number of additional neutrons for isotopes whose mass a predicted sequence's mass is considered when comparing to the precursor mass.

All additional nucleon numbers from 1 to max_isotope inclusive are considered.

TYPE: int DEFAULT: 1

min_log_prob

Minimum log probability to stop decoding early. If a sequence probability is less than this value it is marked as complete. Defaults to -inf.

TYPE: float DEFAULT: -float('inf')

return_beam

Optionally return beam-search results. Ignored in greedy search.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
dict[str, Any]

list[list[str]]: The predicted sequence as a list of residue tokens. This method will return an empty list for each spectrum in the batch where decoding fails i.e. no sequence that fits the precursor mass to within a tolerance is found.