Dataset
AnnotatedPolarsSpectrumDataset(data_frame, peptides)
Bases: PolarsSpectrumDataset
A dataset with a Polars index that includes peptides from an aligned list.
Source code in instanovo/diffusion/dataset.py
AnnotatedSpectrumBatch
Bases: NamedTuple
Represents a batch of annotated spectrum data.
Attributes:
Name | Type | Description |
---|---|---|
spectra |
FloatTensor
|
The tensor containing the spectra data. |
spectra_padding_mask |
BoolTensor
|
A boolean tensor indicating the padding positions in the spectra tensor. |
precursors |
FloatTensor
|
The tensor containing precursor mass information. |
peptides |
LongTensor
|
The tensor containing peptide sequence information. |
peptide_padding_mask |
BoolTensor
|
A boolean tensor indicating the padding positions in the peptides tensor. |
PolarsSpectrumDataset(data_frame)
SpectrumBatch
Bases: NamedTuple
Represents a batch of spectrum data without annotations.
Attributes:
Name | Type | Description |
---|---|---|
spectra |
FloatTensor
|
The tensor containing the spectra data. |
spectra_padding_mask |
BoolTensor
|
A boolean tensor indicating the padding positions in the spectra tensor. |
precursors |
FloatTensor
|
The tensor containing precursor mass information. |
collate_batches(residues, max_length, time_steps, annotated)
Get batch collation function for given residue set, maximum length and time steps.
The returned function combines spectra and precursor information for a batch into
torch
tensors. It also maps the residues in a peptide to their indices in
residues
, pads or truncates them all to max_length
and returns this as a
torch
tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
residues |
ResidueSet
|
The residues in the vocabulary together with their masses and index map. |
required |
max_length |
int
|
The maximum peptide sequence length. All sequences are padded to this length. |
required |
time_steps |
int
|
The number of diffusion time steps. |
required |
Returns:
Type | Description |
---|---|
Callable[[list[tuple[FloatTensor, float, int, str]]], SpectrumBatch | AnnotatedSpectrumBatch]
|
Callable[ [list[tuple[torch.FloatTensor, float, int, str]]], SpectrumBatch | AnnotatedSpectrumBatch]: The function that combines examples into a batch given the parameters above. |