Skip to content

Index

inference

__all__ = ['ScoredSequence', 'Decodable', 'Decoder', 'BeamSearchDecoder', 'GreedyDecoder', 'KnapsackBeamSearchDecoder', 'Knapsack'] module-attribute

BeamSearchDecoder(model: Decodable, suppressed_residues: list[str] | None = None, mass_scale: int = MASS_SCALE, disable_terminal_residues_anywhere: bool = True, keep_invalid_mass_sequences: bool = True, float_dtype: torch.dtype = torch.float64)

Bases: Decoder

A class for decoding from de novo sequence models using beam search.

This class conforms to the Decoder interface and decodes from models that conform to the Decodable interface.

mass_scale = mass_scale instance-attribute

disable_terminal_residues_anywhere = disable_terminal_residues_anywhere instance-attribute

keep_invalid_mass_sequences = keep_invalid_mass_sequences instance-attribute

float_dtype = float_dtype instance-attribute

residue_masses = torch.zeros((len(self.model.residue_set),), dtype=(self.float_dtype)) instance-attribute

terminal_residue_indices = torch.tensor(terminal_residues_idx, dtype=(torch.long)) instance-attribute

suppressed_residue_indices = torch.tensor(suppressed_residues_idx, dtype=(torch.long)) instance-attribute

residue_target_offsets = torch.tensor(residue_target_offsets, dtype=(self.float_dtype)) instance-attribute

vocab_size = len(self.model.residue_set) instance-attribute

decode(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], beam_size: int, max_length: int, mass_tolerance: float = 5e-05, max_isotope: int = 1, min_log_prob: float = -float('inf'), return_encoder_output: bool = False, encoder_output_reduction: Literal['mean', 'max', 'sum', 'full'] = 'mean', return_beam: bool = False, **kwargs) -> dict[str, Any]

Decode predicted residue sequence for a batch of spectra using beam search.

PARAMETER DESCRIPTION
spectra

The spectra to be sequenced.

TYPE: FloatTensor

precursors

The precursor mass, charge and mass-to-charge ratio.

TYPE: torch.FloatTensor[batch size, 3]

beam_size

The maximum size of the beam. Ignored in beam search.

TYPE: int

max_length

The maximum length of a residue sequence.

TYPE: int

mass_tolerance

The maximum relative error for which a predicted sequence is still considered to have matched the precursor mass.

TYPE: float DEFAULT: 5e-05

max_isotope

The maximum number of additional neutrons for isotopes whose mass a predicted sequence's mass is considered when comparing to the precursor mass.

All additional nucleon numbers from 1 to max_isotope inclusive are considered.

TYPE: int DEFAULT: 1

min_log_prob

Minimum log probability to stop decoding early. If a sequence probability is less than this value it is marked as complete. Defaults to -inf.

TYPE: float DEFAULT: -float('inf')

return_beam

Optionally return beam-search results. Ignored in greedy search.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
dict[str, Any]

list[list[str]]: The predicted sequence as a list of residue tokens. This method will return an empty list for each spectrum in the batch where decoding fails i.e. no sequence that fits the precursor mass to within a tolerance is found.

GreedyDecoder(model: Decodable, suppressed_residues: list[str] | None = None, mass_scale: int = MASS_SCALE, disable_terminal_residues_anywhere: bool = True, float_dtype: torch.dtype = torch.float64)

Bases: Decoder

A class for decoding from de novo sequence models using greedy search.

This class conforms to the Decoder interface and decodes from models that conform to the Decodable interface.

mass_scale = mass_scale instance-attribute

disable_terminal_residues_anywhere = disable_terminal_residues_anywhere instance-attribute

float_dtype = float_dtype instance-attribute

residue_masses = torch.zeros((len(self.model.residue_set),), dtype=(self.float_dtype)) instance-attribute

terminal_residue_indices = torch.tensor(terminal_residues_idx, dtype=(torch.long)) instance-attribute

suppressed_residue_indices = torch.tensor(suppressed_residues_idx, dtype=(torch.long)) instance-attribute

residue_target_offsets = torch.tensor(residue_target_offsets, dtype=(self.float_dtype)) instance-attribute

vocab_size = len(self.model.residue_set) instance-attribute

decode(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], max_length: int, mass_tolerance: float = 5e-05, max_isotope: int = 1, min_log_prob: float = -float('inf'), return_encoder_output: bool = False, encoder_output_reduction: Literal['mean', 'max', 'sum', 'full'] = 'mean', **kwargs) -> dict[str, Any]

Decode predicted residue sequence for a batch of spectra using greedy search.

PARAMETER DESCRIPTION
spectra

The spectra to be sequenced.

TYPE: FloatTensor

precursors

The precursor mass, charge and mass-to-charge ratio.

TYPE: torch.FloatTensor[batch size, 3]

max_length

The maximum length of a residue sequence.

TYPE: int

mass_tolerance

The maximum relative error for which a predicted sequence is still considered to have matched the precursor mass.

TYPE: float DEFAULT: 5e-05

max_isotope

The maximum number of additional neutrons for isotopes whose mass a predicted sequence's mass is considered when comparing to the precursor mass.

All additional nucleon numbers from 1 to max_isotope inclusive are considered.

TYPE: int DEFAULT: 1

min_log_prob

Minimum log probability to stop decoding early. If a sequence probability is less than this value it is marked as complete. Defaults to -inf.

TYPE: float DEFAULT: -float('inf')

return_encoder_output

Whether to return the encoder output.

TYPE: bool DEFAULT: False

encoder_output_reduction

The reduction to apply to the encoder output. Valid values are "mean", "max", "sum", "full". Defaults to "mean".

TYPE: Literal['mean', 'max', 'sum', 'full'] DEFAULT: 'mean'

RETURNS DESCRIPTION
dict[str, Any]

dict[str, Any]: Required keys: - "predictions": list[list[str]] - "mass_error": list[float] - "prediction_log_probability": list[float] - "prediction_token_log_probabilities": list[list[float]] - "encoder_output": list[float] (optional) Example additional keys: - "prediction_beam_0": list[str]

Decodable

An interface for models that can be decoded.

Algorithms should conform to the search interface.

residue_set: ResidueSet abstractmethod property

Every model must have a residue_set attribute.

init(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], *args, **kwargs) -> Any abstractmethod

Initialize the search state.

PARAMETER DESCRIPTION
spectra

The spectra to be sequenced.

TYPE: FloatTensor

precursors

The precursor mass, charge and mass-to-charge ratio.

TYPE: torch.FloatTensor[batch size, 3]

score_candidates(sequences: Integer[Peptide, '...'], precursor_mass_charge: Float[PrecursorFeatures, '...'], *args, **kwargs) -> torch.FloatTensor abstractmethod

Generate and score the next set of candidates.

PARAMETER DESCRIPTION
sequences

Partial residue sequences in generated the course of decoding.

TYPE: LongTensor

precursor_mass_charge

The precursor mass, charge and mass-to-charge ratio.

TYPE: torch.FloatTensor[batch size, 3]

get_residue_masses(mass_scale: int) -> torch.LongTensor abstractmethod

Get residue masses for the model's residue vocabulary.

PARAMETER DESCRIPTION
mass_scale

The scale in Daltons at which masses are calculated and rounded off. For example, a scale of 10000 would represent masses at a scale of 1e4 Da.

TYPE: int

decode(sequence: Integer[Peptide, '...']) -> list[str] abstractmethod

Map sequences of indices to residues using the model's residue vocabulary.

PARAMETER DESCRIPTION
sequence

The sequence of residue indices to be mapped to the corresponding residue strings.

TYPE: LongTensor

get_eos_index() -> int abstractmethod

Get the end of sequence token's index in the model's residue vocabulary.

get_empty_index() -> int abstractmethod

Get the empty token's index in the model's residue vocabulary.

Decoder(model: Decodable)

A class that implements some search algorithm for decoding.

Model should conform to the Decodable interface.

PARAMETER DESCRIPTION
model

The model to predict residue sequences from using the implemented search algorithm.

TYPE: Decodable

model = model instance-attribute

decode(spectra: Float[Spectrum, '...'], precursors: Float[PrecursorFeatures, '...'], *args, **kwargs) -> dict[str, Any] abstractmethod

Generate the predicted residue sequence using the decoder's search algorithm.

PARAMETER DESCRIPTION
spectra

The spectra to be sequenced.

TYPE: FloatTensor

precursors

The precursor mass, charge and mass-to-charge ratio.

TYPE: FloatTensor

RETURNS DESCRIPTION
dict[str, Any]

dict[str, Any]: Required keys: - "sequence": list[str] - "mass_error": float - "sequence_log_probability": float - "token_log_probabilities": list[float] - "encoder_output": list[float] (optional) Example additional keys: - "sequence_beam_0": list[str]

ScoredSequence(sequence: list[str], mass_error: float, sequence_log_probability: float, token_log_probabilities: list[float]) dataclass

This class holds a residue sequence and its log probability.

sequence: list[str] instance-attribute

mass_error: float instance-attribute

sequence_log_probability: float instance-attribute

token_log_probabilities: list[float] instance-attribute

Knapsack(max_mass: float, mass_scale: int, max_isotope: int, residues: list[str], residue_indices: dict[str, int], masses: MassArray, chart: KnapsackChart) dataclass

A class that precomputes and stores a knapsack chart.

PARAMETER DESCRIPTION
max_mass

The maximum mass up to which the chart is calculated.

TYPE: float

mass_scale

The scale in Daltons at which masses are calculated and rounded off. For example, a scale of 10000 would represent masses at a scale of 1e4 Da.

TYPE: int

residues

The list of residues that are considered in knapsack decoding. The order of this list is the inverse of residue_indices.

TYPE: list[str]

residue_indices

A mapping from residues as strings to indices in the knapsack chart. This is the inverse of residues.

TYPE: dict[str, int]

masses

The set of realisable masses in ascending order.

TYPE: numpy.ndarray[number of masses]

chart

The chart of realisable masses and residues that can lead to these masses. chart[mass, residue] is True if and only if a sequence of mass can be generated starting with the residue with index residue.

TYPE: numpy.ndarray[number of masses, number of residues]

max_mass: float instance-attribute

mass_scale: int instance-attribute

max_isotope: int instance-attribute

residues: list[str] instance-attribute

residue_indices: dict[str, int] instance-attribute

masses: MassArray instance-attribute

chart: KnapsackChart instance-attribute

save(path: str) -> None

Save the knapsack file to a directory.

PARAMETER DESCRIPTION
path

The path to the directory.

TYPE: str

RAISES DESCRIPTION
FileExistsError

If the directory path already exists, this message raise an exception.

construct_knapsack(residue_masses: dict[str, float], residue_indices: dict[str, int], max_mass: float, mass_scale: int, max_isotope: int = 2) -> 'Knapsack' classmethod

Construct a knapsack chart using depth-first search.

Previous construction algorithms have used dynamic programming, but its space and time complexity scale linearly with mass resolution since every possible mass is iterated over rather than only the feasible masses.

Graph search algorithms only iterate over feasible masses which become a smaller and smaller share of possible masses as the mass resolution increases. This leads to dramatic performance improvements.

This implementation uses depth-first search since its agenda is a stack which can be implemented using python lists whose operations have amortized constant time complexity.

PARAMETER DESCRIPTION
residue_masses

A mapping from considered residues to their masses.

TYPE: dict[str, float]

max_mass

The maximum mass up to which the chart is calculated.

TYPE: float

mass_scale

The scale in Daltons at which masses are calculated and rounded off. For example, a scale of 10000 would represent masses at a scale of 1e4 Da.

TYPE: int

from_file(path: str) -> 'Knapsack' classmethod

Load a knapsack saved to a directory.

PARAMETER DESCRIPTION
path

The path to the directory.

TYPE: str

RETURNS DESCRIPTION
_type_

description

TYPE: 'Knapsack'

get_feasible_masses(target_mass: float, tolerance: float) -> list[int]

Find a set of feasible masses for a given target mass and tolerance using binary search.

PARAMETER DESCRIPTION
target_mass

The masses to be decoded in Daltons.

TYPE: float

tolerance

The mass tolerance in Daltons.

TYPE: float

RETURNS DESCRIPTION
list[int]

list[int]: A list of feasible masses.

KnapsackBeamSearchDecoder(model: Decodable, knapsack: Knapsack, suppressed_residues: list[str] | None = None, disable_terminal_residues_anywhere: bool = True, keep_invalid_mass_sequences: bool = True, float_dtype: torch.dtype = torch.float64)

Bases: Decoder

A class for decoding from de novo sequence models using beam search.

This class conforms to the Decoder interface and decodes from models that conform to the Decodable interface.

knapsack = knapsack instance-attribute

chart = torch.tensor(self.knapsack.chart) instance-attribute

mass_scale = knapsack.mass_scale instance-attribute

disable_terminal_residues_anywhere = disable_terminal_residues_anywhere instance-attribute

keep_invalid_mass_sequences = keep_invalid_mass_sequences instance-attribute

float_dtype = float_dtype instance-attribute

residue_masses = torch.zeros((len(self.model.residue_set),), dtype=(self.float_dtype)) instance-attribute

terminal_residue_indices = torch.tensor(terminal_residues_idx, dtype=(torch.long)) instance-attribute

suppressed_residue_indices = torch.tensor(suppressed_residues_idx, dtype=(torch.long)) instance-attribute

residue_target_offsets = torch.tensor(residue_target_offsets, dtype=(self.float_dtype)) instance-attribute

vocab_size = len(self.model.residue_set) instance-attribute

from_file(model: Decodable, path: str, float_dtype: torch.dtype = torch.float64) -> KnapsackBeamSearchDecoder classmethod

Initialize a decoder by loading a saved knapsack.

PARAMETER DESCRIPTION
model

The model to be decoded from.

TYPE: Decodable

path

The path to the directory where the knapsack was saved to.

TYPE: str

float_dtype

The floating point dtype to use.

TYPE: dtype DEFAULT: float64

RETURNS DESCRIPTION
KnapsackBeamSearchDecoder

The decoder.

TYPE: KnapsackBeamSearchDecoder

decode(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], beam_size: int, max_length: int, mass_tolerance: float = 5e-05, max_isotope: int = 1, min_log_prob: float = -float('inf'), return_encoder_output: bool = False, encoder_output_reduction: Literal['mean', 'max', 'sum', 'full'] = 'mean', return_beam: bool = False, **kwargs) -> dict[str, Any]

Decode predicted residue sequence for a batch of spectra using beam search.

PARAMETER DESCRIPTION
spectra

The spectra to be sequenced.

TYPE: FloatTensor

precursors

The precursor mass, charge and mass-to-charge ratio.

TYPE: torch.FloatTensor[batch size, 3]

beam_size

The maximum size of the beam. Ignored in beam search.

TYPE: int

max_length

The maximum length of a residue sequence.

TYPE: int

mass_tolerance

The maximum relative error for which a predicted sequence is still considered to have matched the precursor mass.

TYPE: float DEFAULT: 5e-05

max_isotope

The maximum number of additional neutrons for isotopes whose mass a predicted sequence's mass is considered when comparing to the precursor mass.

All additional nucleon numbers from 1 to max_isotope inclusive are considered.

TYPE: int DEFAULT: 1

min_log_prob

Minimum log probability to stop decoding early. If a sequence probability is less than this value it is marked as complete. Defaults to -inf.

TYPE: float DEFAULT: -float('inf')

return_beam

Optionally return beam-search results. Ignored in greedy search.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
dict[str, Any]

list[list[str]]: The predicted sequence as a list of residue tokens. This method will return an empty list for each spectrum in the batch where decoding fails i.e. no sequence that fits the precursor mass to within a tolerance is found.