Index
inference
__all__ = ['ScoredSequence', 'Decodable', 'Decoder', 'BeamSearchDecoder', 'GreedyDecoder', 'KnapsackBeamSearchDecoder', 'Knapsack']
module-attribute
BeamSearchDecoder(model: Decodable, suppressed_residues: list[str] | None = None, mass_scale: int = MASS_SCALE, disable_terminal_residues_anywhere: bool = True, keep_invalid_mass_sequences: bool = True, float_dtype: torch.dtype = torch.float64)
Bases: Decoder
A class for decoding from de novo sequence models using beam search.
This class conforms to the Decoder interface and decodes from
models that conform to the Decodable interface.
mass_scale = mass_scale
instance-attribute
disable_terminal_residues_anywhere = disable_terminal_residues_anywhere
instance-attribute
keep_invalid_mass_sequences = keep_invalid_mass_sequences
instance-attribute
float_dtype = float_dtype
instance-attribute
residue_masses = torch.zeros((len(self.model.residue_set),), dtype=(self.float_dtype))
instance-attribute
terminal_residue_indices = torch.tensor(terminal_residues_idx, dtype=(torch.long))
instance-attribute
suppressed_residue_indices = torch.tensor(suppressed_residues_idx, dtype=(torch.long))
instance-attribute
residue_target_offsets = torch.tensor(residue_target_offsets, dtype=(self.float_dtype))
instance-attribute
vocab_size = len(self.model.residue_set)
instance-attribute
decode(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], beam_size: int, max_length: int, mass_tolerance: float = 5e-05, max_isotope: int = 1, min_log_prob: float = -float('inf'), return_encoder_output: bool = False, encoder_output_reduction: Literal['mean', 'max', 'sum', 'full'] = 'mean', return_beam: bool = False, **kwargs) -> dict[str, Any]
Decode predicted residue sequence for a batch of spectra using beam search.
| PARAMETER | DESCRIPTION |
|---|---|
spectra
|
The spectra to be sequenced.
TYPE:
|
precursors
|
The precursor mass, charge and mass-to-charge ratio.
TYPE:
|
beam_size
|
The maximum size of the beam. Ignored in beam search.
TYPE:
|
max_length
|
The maximum length of a residue sequence.
TYPE:
|
mass_tolerance
|
The maximum relative error for which a predicted sequence is still considered to have matched the precursor mass.
TYPE:
|
max_isotope
|
The maximum number of additional neutrons for isotopes whose mass a predicted sequence's mass is considered when comparing to the precursor mass. All additional nucleon numbers from 1 to
TYPE:
|
min_log_prob
|
Minimum log probability to stop decoding early. If a sequence probability is less than this value it is marked as complete. Defaults to -inf.
TYPE:
|
return_beam
|
Optionally return beam-search results. Ignored in greedy search.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
list[list[str]]: The predicted sequence as a list of residue tokens. This method will return an empty list for each spectrum in the batch where decoding fails i.e. no sequence that fits the precursor mass to within a tolerance is found. |
GreedyDecoder(model: Decodable, suppressed_residues: list[str] | None = None, mass_scale: int = MASS_SCALE, disable_terminal_residues_anywhere: bool = True, float_dtype: torch.dtype = torch.float64)
Bases: Decoder
A class for decoding from de novo sequence models using greedy search.
This class conforms to the Decoder interface and decodes from
models that conform to the Decodable interface.
mass_scale = mass_scale
instance-attribute
disable_terminal_residues_anywhere = disable_terminal_residues_anywhere
instance-attribute
float_dtype = float_dtype
instance-attribute
residue_masses = torch.zeros((len(self.model.residue_set),), dtype=(self.float_dtype))
instance-attribute
terminal_residue_indices = torch.tensor(terminal_residues_idx, dtype=(torch.long))
instance-attribute
suppressed_residue_indices = torch.tensor(suppressed_residues_idx, dtype=(torch.long))
instance-attribute
residue_target_offsets = torch.tensor(residue_target_offsets, dtype=(self.float_dtype))
instance-attribute
vocab_size = len(self.model.residue_set)
instance-attribute
decode(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], max_length: int, mass_tolerance: float = 5e-05, max_isotope: int = 1, min_log_prob: float = -float('inf'), return_encoder_output: bool = False, encoder_output_reduction: Literal['mean', 'max', 'sum', 'full'] = 'mean', **kwargs) -> dict[str, Any]
Decode predicted residue sequence for a batch of spectra using greedy search.
| PARAMETER | DESCRIPTION |
|---|---|
spectra
|
The spectra to be sequenced.
TYPE:
|
precursors
|
The precursor mass, charge and mass-to-charge ratio.
TYPE:
|
max_length
|
The maximum length of a residue sequence.
TYPE:
|
mass_tolerance
|
The maximum relative error for which a predicted sequence is still considered to have matched the precursor mass.
TYPE:
|
max_isotope
|
The maximum number of additional neutrons for isotopes whose mass a predicted sequence's mass is considered when comparing to the precursor mass. All additional nucleon numbers from 1 to
TYPE:
|
min_log_prob
|
Minimum log probability to stop decoding early. If a sequence probability is less than this value it is marked as complete. Defaults to -inf.
TYPE:
|
return_encoder_output
|
Whether to return the encoder output.
TYPE:
|
encoder_output_reduction
|
The reduction to apply to the encoder output. Valid values are "mean", "max", "sum", "full". Defaults to "mean".
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
dict[str, Any]: Required keys: - "predictions": list[list[str]] - "mass_error": list[float] - "prediction_log_probability": list[float] - "prediction_token_log_probabilities": list[list[float]] - "encoder_output": list[float] (optional) Example additional keys: - "prediction_beam_0": list[str] |
Decodable
An interface for models that can be decoded.
Algorithms should conform to the search interface.
residue_set: ResidueSet
abstractmethod
property
Every model must have a residue_set attribute.
init(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], *args, **kwargs) -> Any
abstractmethod
Initialize the search state.
| PARAMETER | DESCRIPTION |
|---|---|
spectra
|
The spectra to be sequenced.
TYPE:
|
precursors
|
The precursor mass, charge and mass-to-charge ratio.
TYPE:
|
score_candidates(sequences: Integer[Peptide, '...'], precursor_mass_charge: Float[PrecursorFeatures, '...'], *args, **kwargs) -> torch.FloatTensor
abstractmethod
Generate and score the next set of candidates.
| PARAMETER | DESCRIPTION |
|---|---|
sequences
|
Partial residue sequences in generated the course of decoding.
TYPE:
|
precursor_mass_charge
|
The precursor mass, charge and mass-to-charge ratio.
TYPE:
|
get_residue_masses(mass_scale: int) -> torch.LongTensor
abstractmethod
Get residue masses for the model's residue vocabulary.
| PARAMETER | DESCRIPTION |
|---|---|
mass_scale
|
The scale in Daltons at which masses are calculated and rounded off. For example, a scale of 10000 would represent masses at a scale of 1e4 Da.
TYPE:
|
decode(sequence: Integer[Peptide, '...']) -> list[str]
abstractmethod
Map sequences of indices to residues using the model's residue vocabulary.
| PARAMETER | DESCRIPTION |
|---|---|
sequence
|
The sequence of residue indices to be mapped to the corresponding residue strings.
TYPE:
|
get_eos_index() -> int
abstractmethod
Get the end of sequence token's index in the model's residue vocabulary.
get_empty_index() -> int
abstractmethod
Get the empty token's index in the model's residue vocabulary.
Decoder(model: Decodable)
A class that implements some search algorithm for decoding.
Model should conform to the Decodable interface.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The model to predict residue sequences from using the implemented search algorithm.
TYPE:
|
model = model
instance-attribute
decode(spectra: Float[Spectrum, '...'], precursors: Float[PrecursorFeatures, '...'], *args, **kwargs) -> dict[str, Any]
abstractmethod
Generate the predicted residue sequence using the decoder's search algorithm.
| PARAMETER | DESCRIPTION |
|---|---|
spectra
|
The spectra to be sequenced.
TYPE:
|
precursors
|
The precursor mass, charge and mass-to-charge ratio.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
dict[str, Any]: Required keys: - "sequence": list[str] - "mass_error": float - "sequence_log_probability": float - "token_log_probabilities": list[float] - "encoder_output": list[float] (optional) Example additional keys: - "sequence_beam_0": list[str] |
ScoredSequence(sequence: list[str], mass_error: float, sequence_log_probability: float, token_log_probabilities: list[float])
dataclass
This class holds a residue sequence and its log probability.
sequence: list[str]
instance-attribute
mass_error: float
instance-attribute
sequence_log_probability: float
instance-attribute
token_log_probabilities: list[float]
instance-attribute
Knapsack(max_mass: float, mass_scale: int, max_isotope: int, residues: list[str], residue_indices: dict[str, int], masses: MassArray, chart: KnapsackChart)
dataclass
A class that precomputes and stores a knapsack chart.
| PARAMETER | DESCRIPTION |
|---|---|
max_mass
|
The maximum mass up to which the chart is calculated.
TYPE:
|
mass_scale
|
The scale in Daltons at which masses are calculated and rounded off. For example, a scale of 10000 would represent masses at a scale of 1e4 Da.
TYPE:
|
residues
|
The list of residues that are considered
in knapsack decoding. The order of this
list is the inverse of
TYPE:
|
residue_indices
|
A mapping from residues as strings
to indices in the knapsack chart.
This is the inverse of
TYPE:
|
masses
|
The set of realisable masses in ascending order.
TYPE:
|
chart
|
The chart of realisable masses and residues that
can lead to these masses.
TYPE:
|
max_mass: float
instance-attribute
mass_scale: int
instance-attribute
max_isotope: int
instance-attribute
residues: list[str]
instance-attribute
residue_indices: dict[str, int]
instance-attribute
masses: MassArray
instance-attribute
chart: KnapsackChart
instance-attribute
save(path: str) -> None
Save the knapsack file to a directory.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
The path to the directory.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
FileExistsError
|
If the directory |
construct_knapsack(residue_masses: dict[str, float], residue_indices: dict[str, int], max_mass: float, mass_scale: int, max_isotope: int = 2) -> 'Knapsack'
classmethod
Construct a knapsack chart using depth-first search.
Previous construction algorithms have used dynamic
programming, but its space and time complexity
scale linearly with mass resolution since every
possible mass is iterated over rather than only the
feasible masses.
Graph search algorithms only
iterate over feasible masses which become a
smaller and smaller share of possible masses as the
mass resolution increases. This leads to dramatic
performance improvements.
This implementation uses depth-first search since its agenda is a stack which can be implemented using python lists whose operations have amortized constant time complexity.
| PARAMETER | DESCRIPTION |
|---|---|
residue_masses
|
A mapping from considered residues to their masses.
TYPE:
|
max_mass
|
The maximum mass up to which the chart is calculated.
TYPE:
|
mass_scale
|
The scale in Daltons at which masses are calculated and rounded off. For example, a scale of 10000 would represent masses at a scale of 1e4 Da.
TYPE:
|
from_file(path: str) -> 'Knapsack'
classmethod
Load a knapsack saved to a directory.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
The path to the directory.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
_type_
|
description
TYPE:
|
get_feasible_masses(target_mass: float, tolerance: float) -> list[int]
Find a set of feasible masses for a given target mass and tolerance using binary search.
| PARAMETER | DESCRIPTION |
|---|---|
target_mass
|
The masses to be decoded in Daltons.
TYPE:
|
tolerance
|
The mass tolerance in Daltons.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[int]
|
list[int]: A list of feasible masses. |
KnapsackBeamSearchDecoder(model: Decodable, knapsack: Knapsack, suppressed_residues: list[str] | None = None, disable_terminal_residues_anywhere: bool = True, keep_invalid_mass_sequences: bool = True, float_dtype: torch.dtype = torch.float64)
Bases: Decoder
A class for decoding from de novo sequence models using beam search.
This class conforms to the Decoder interface and decodes from
models that conform to the Decodable interface.
knapsack = knapsack
instance-attribute
chart = torch.tensor(self.knapsack.chart)
instance-attribute
mass_scale = knapsack.mass_scale
instance-attribute
disable_terminal_residues_anywhere = disable_terminal_residues_anywhere
instance-attribute
keep_invalid_mass_sequences = keep_invalid_mass_sequences
instance-attribute
float_dtype = float_dtype
instance-attribute
residue_masses = torch.zeros((len(self.model.residue_set),), dtype=(self.float_dtype))
instance-attribute
terminal_residue_indices = torch.tensor(terminal_residues_idx, dtype=(torch.long))
instance-attribute
suppressed_residue_indices = torch.tensor(suppressed_residues_idx, dtype=(torch.long))
instance-attribute
residue_target_offsets = torch.tensor(residue_target_offsets, dtype=(self.float_dtype))
instance-attribute
vocab_size = len(self.model.residue_set)
instance-attribute
from_file(model: Decodable, path: str, float_dtype: torch.dtype = torch.float64) -> KnapsackBeamSearchDecoder
classmethod
Initialize a decoder by loading a saved knapsack.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The model to be decoded from.
TYPE:
|
path
|
The path to the directory where the knapsack was saved to.
TYPE:
|
float_dtype
|
The floating point dtype to use.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
KnapsackBeamSearchDecoder
|
The decoder. |
decode(spectra: Float[Spectrum, ' batch'], precursors: Float[PrecursorFeatures, ' batch'], beam_size: int, max_length: int, mass_tolerance: float = 5e-05, max_isotope: int = 1, min_log_prob: float = -float('inf'), return_encoder_output: bool = False, encoder_output_reduction: Literal['mean', 'max', 'sum', 'full'] = 'mean', return_beam: bool = False, **kwargs) -> dict[str, Any]
Decode predicted residue sequence for a batch of spectra using beam search.
| PARAMETER | DESCRIPTION |
|---|---|
spectra
|
The spectra to be sequenced.
TYPE:
|
precursors
|
The precursor mass, charge and mass-to-charge ratio.
TYPE:
|
beam_size
|
The maximum size of the beam. Ignored in beam search.
TYPE:
|
max_length
|
The maximum length of a residue sequence.
TYPE:
|
mass_tolerance
|
The maximum relative error for which a predicted sequence is still considered to have matched the precursor mass.
TYPE:
|
max_isotope
|
The maximum number of additional neutrons for isotopes whose mass a predicted sequence's mass is considered when comparing to the precursor mass. All additional nucleon numbers from 1 to
TYPE:
|
min_log_prob
|
Minimum log probability to stop decoding early. If a sequence probability is less than this value it is marked as complete. Defaults to -inf.
TYPE:
|
return_beam
|
Optionally return beam-search results. Ignored in greedy search.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
list[list[str]]: The predicted sequence as a list of residue tokens. This method will return an empty list for each spectrum in the batch where decoding fails i.e. no sequence that fits the precursor mass to within a tolerance is found. |