Index

`common`

`all = ['DataProcessor', 'AccelerateDeNovoTrainer', 'AccelerateDeNovoPredictor', 'FinetuneScheduler', 'WarmupScheduler', 'CosineWarmupScheduler', 'NeptuneSummaryWriter', 'TrainingState', 'Timer']` `module-attribute`

`DataProcessor(metadata_columns: list[str] | set[str] | None = None)`

Data processor abstract class.

This class is used to process the data before it is used in the model. It is designed to be used with the Dataset class from the HuggingFace datasets library.

It includes two main methods: - process_row: Processes a row of data. - collate_fn: Collates a batch of data. To be passed to the DataLoader class.

Additionally, it includes a way to pass metadata columns that will be kept after processing a dataset. These metadata columns will also bypass the collate_fn.

Initialize the data processor.

PARAMETER	DESCRIPTION
`metadata_columns`	The metadata columns to add to the expected columns. TYPE: `list[str] \| set[str] \| None` DEFAULT: `None`

`metadata_columns: set[str]` `property`

Get the metadata columns.

These columns are kept after processing a dataset.

RETURNS	DESCRIPTION
`set[str]`	list[str]: The metadata columns.

`process_row(row: dict[str, Any]) -> dict[str, Any]` `abstractmethod`

Process a single row of data.

PARAMETER	DESCRIPTION
`row`	The row of data to process in dict format. TYPE: `dict[str, Any]`

RETURNS	DESCRIPTION
`dict[str, Any]`	dict[str, Any]: The processed row with resulting columns.

`process_dataset(dataset: Dataset, return_format: str | None = 'torch') -> Dataset`

Process a dataset by mapping the process_row method.

The resulting dataset has the columns expected by the collate_fn method.

PARAMETER	DESCRIPTION
`dataset`	The dataset to process. TYPE: `Dataset`
`return_format`	The format to return the dataset in. Default is "torch". TYPE: `str \| None` DEFAULT: `'torch'`

RETURNS	DESCRIPTION
`Dataset`	The processed dataset. TYPE: `Dataset`

`collate_fn(batch: list[dict[str, Any]]) -> dict[str, Any]`

Collate a batch.

Metadata columns are added after collation.

PARAMETER	DESCRIPTION
`batch`	The batch to collate. TYPE: `list[dict[str, Any]]`

RETURNS	DESCRIPTION
`dict[str, Any]`	dict[str, Any]: The collated batch with metadata.

`get_expected_columns() -> list[str]`

Get the expected columns to be kept in the dataset after processing.

These columns are expected by the collate_fn method and include both data and metadata columns.

RETURNS	DESCRIPTION
`list[str]`	list[str]: The expected columns.

`add_metadata_columns(columns: list[str] | set[str]) -> None`

Add expected metadata columns.

PARAMETER	DESCRIPTION
`columns`	The columns to add. TYPE: `list[str] \| set[str]`

`remove_modifications(peptide: str, replace_isoleucine_with_leucine: bool = True) -> str` `staticmethod`

Remove modifications and optionally replace Isoleucine with Leucine.

PARAMETER	DESCRIPTION
`peptide`	The peptide to remove modifications from. TYPE: `str`
`replace_isoleucine_with_leucine`	Whether to replace Isoleucine with Leucine. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`str`	The peptide with modifications removed. TYPE: `str`

`AccelerateDeNovoPredictor(config: DictConfig)`

Predictor class that uses the Accelerate library.

`s3: S3FileHandler` `property`

Get the S3 file handler.

RETURNS	DESCRIPTION
`S3FileHandler`	The S3 file handler TYPE: `S3FileHandler`

`config = config` `instance-attribute`

`targets: list | None = None` `instance-attribute`

`output_path = self.config.get('output_path', None)` `instance-attribute`

`pred_df: pd.DataFrame | None = None` `instance-attribute`

`results_dict: dict | None = None` `instance-attribute`

`prediction_tokenised_col = self.config.get('prediction_tokenised_col', 'predictions_tokenised')` `instance-attribute`

`prediction_col = self.config.get('prediction_col', 'predictions')` `instance-attribute`

`log_probs_col = self.config.get('log_probs_col', 'log_probs')` `instance-attribute`

`token_log_probs_col = self.config.get('token_log_probs_col', 'token_log_probs')` `instance-attribute`

`save_encoder_outputs = config.get('save_encoder_outputs', False)` `instance-attribute`

`encoder_output_path = config.get('encoder_output_path', None)` `instance-attribute`

`encoder_output_reduction = config.get('encoder_output_reduction', 'mean')` `instance-attribute`

`accelerator = self.setup_accelerator()` `instance-attribute`

`denovo = self.config.get('denovo', False)` `instance-attribute`

`model = self.model.eval()` `instance-attribute`

`residue_set = self.model.residue_set` `instance-attribute`

`test_dataset = self.load_dataset()` `instance-attribute`

`test_dataloader = self.build_dataloader(self.test_dataset)` `instance-attribute`

`decoder = self.setup_decoder()` `instance-attribute`

`metrics = self.setup_metrics()` `instance-attribute`

`running_loss = None` `instance-attribute`

`steps_per_inference = len(self.test_dataloader)` `instance-attribute`

`load_model() -> Tuple[nn.Module, DictConfig]` `abstractmethod`

Load the model.

`setup_decoder() -> Decoder` `abstractmethod`

Setup the decoder.

`setup_data_processor() -> DataProcessor` `abstractmethod`

Setup the data processor.

`get_predictions(batch: Any) -> dict[str, Any]` `abstractmethod`

Get the predictions for a batch.

`postprocess_dataset(dataset: Dataset) -> Dataset`

Postprocess the dataset.

`load_dataset() -> Dataset`

Load the test dataset.

RETURNS	DESCRIPTION
`Dataset`	The test dataset TYPE: `Dataset`

`print_sample_batch() -> None`

Print a sample batch of the training data.

`setup_metrics() -> Metrics`

Setup the metrics.

`setup_accelerator() -> Accelerator`

Setup the accelerator.

`build_dataloader(test_dataset: Dataset) -> torch.utils.data.DataLoader`

Setup the dataloaders.

`predict() -> pd.DataFrame`

Predict the test dataset.

`predictions_to_df(predictions: dict[str, list]) -> pd.DataFrame`

Convert the predictions to a pandas DataFrame.

PARAMETER	DESCRIPTION
`predictions`	The predictions dictionary TYPE: `dict[str, list]`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: The predictions dataframe

`postprocess_predictions(pred_df: pd.DataFrame) -> pd.DataFrame`

Postprocess the predictions.

Optionally, this can be used to modify the predictions, eg. ensembling. By default, this does nothing.

PARAMETER	DESCRIPTION
`pred_df`	The predictions dataframe TYPE: `DataFrame`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: The postprocessed predictions dataframe

`calculate_metrics(pred_df: pd.DataFrame) -> dict[str, Any] | None`

Calculate the metrics.

PARAMETER	DESCRIPTION
`pred_df`	The predictions dataframe TYPE: `DataFrame`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	dict[str, Any] \| None: The results dictionary containing the metrics

`save_predictions(pred_df: pd.DataFrame, results_dict: dict[str, list] | None = None) -> None`

Save the predictions to a file.

PARAMETER	DESCRIPTION
`pred_df`	The predictions dataframe TYPE: `DataFrame`
`results_dict`	The results dictionary containing the metrics TYPE: `dict[str, list] \| None` DEFAULT: `None`

`save_encoder_outputs_to_parquet(spectrum_ids: list[str], encoder_outputs: list[np.ndarray]) -> None`

Save the encoder outputs to a file.

PARAMETER	DESCRIPTION
`encoder_outputs`	The encoder outputs TYPE: `list[ndarray]`
`spectrum_ids`	The spectrum ids TYPE: `list[str]`

`CosineWarmupScheduler(optimizer: torch.optim.Optimizer, warmup: int, max_iters: int)`

Bases: _LRScheduler

Learning rate scheduler with linear warm up followed by cosine shaped decay.

Parameters

optimizer : torch.optim.Optimizer Optimizer object. warmup : int The number of warm up iterations. max_iters : int The total number of iterations.

`get_lr() -> list[float]`

Get the learning rate at the current step.

`get_lr_factor(epoch: int) -> float`

Get the LR factor at the current step.

`FinetuneScheduler(model_state_dict: dict, config: DictConfig, steps_per_epoch: int | None = None)`

Scheduler for unfreezing parameters of a model.

PARAMETER	DESCRIPTION
`model_state_dict`	The state dictionary of the model. TYPE: `dict`
`config`	The configuration for the scheduler. TYPE: `DictConfig`
`steps_per_epoch`	The number of steps per epoch. TYPE: `int \| None` DEFAULT: `None`

`model_state_dict = model_state_dict` `instance-attribute`

`config = config` `instance-attribute`

`steps_per_epoch = steps_per_epoch` `instance-attribute`

`is_verbose = self.config.get('verbose', False)` `instance-attribute`

`schedule = self._get_schedule()` `instance-attribute`

`next_phase: dict[str, Any] | None = self.schedule.pop(0)` `instance-attribute`

`step(global_step: int) -> None`

Step the unfreezing scheduler.

PARAMETER	DESCRIPTION
`global_step`	The global step of the model. TYPE: `int`

`WarmupScheduler(optimizer: torch.optim.Optimizer, warmup: int)`

Bases: _LRScheduler

Linear warmup scheduler.

`warmup = warmup` `instance-attribute`

`get_lr() -> list[float]`

Get the learning rate at the current step.

`get_lr_factor(epoch: int) -> float`

Get the LR factor at the current step.

`AccelerateDeNovoTrainer(config: DictConfig)`

Trainer class that uses the Accelerate library.

`run_id: str` `property`

Get the run ID.

RETURNS	DESCRIPTION
`str`	The run ID TYPE: `str`

`s3: S3FileHandler` `property`

Get the S3 file handler.

RETURNS	DESCRIPTION
`S3FileHandler`	The S3 file handler TYPE: `S3FileHandler`

`global_step: int` `property`

Get the current global training step.

This represents the total number of training steps across all epochs.

RETURNS	DESCRIPTION
`int`	The current global step number TYPE: `int`

`epoch: int` `property`

Get the current training epoch.

This represents the current epoch number in the training process.

RETURNS	DESCRIPTION
`int`	The current epoch number TYPE: `int`

`training_state: TrainingState` `property`

Get the training state.

`config = config` `instance-attribute`

`enable_verbose_logging = self.config.get('enable_verbose_logging', True)` `instance-attribute`

`accelerator = self.setup_accelerator()` `instance-attribute`

`residue_set = ResidueSet(residue_masses=(self.config.residues.get('residues')), residue_remapping=(self.config.dataset.get('residue_remapping', None)))` `instance-attribute`

`model = self.setup_model()` `instance-attribute`

`optimizer = self.setup_optimizer()` `instance-attribute`

`lr_scheduler = self.setup_scheduler()` `instance-attribute`

`decoder = self.setup_decoder()` `instance-attribute`

`metrics = self.setup_metrics()` `instance-attribute`

`running_loss = None` `instance-attribute`

`total_steps = self.config.get('training_steps', 2500000)` `instance-attribute`

`finetune_scheduler: FinetuneScheduler | None = FinetuneScheduler(self.model.state_dict(), self.config.get('finetune'))` `instance-attribute`

`steps_per_validation = self.config.get('validation_interval', 100000)` `instance-attribute`

`steps_per_checkpoint = self.config.get('checkpoint_interval', 100000)` `instance-attribute`

`last_validation_metric = None` `instance-attribute`

`best_checkpoint_metric = None` `instance-attribute`

`setup_model() -> nn.Module` `abstractmethod`

Setup the model.

`setup_optimizer() -> torch.optim.Optimizer` `abstractmethod`

Setup the optimizer.

`setup_decoder() -> Decoder` `abstractmethod`

Setup the decoder.

`setup_data_processors() -> tuple[DataProcessor, DataProcessor]` `abstractmethod`

Setup the data processor.

`save_model(is_best_checkpoint: bool = False) -> None` `abstractmethod`

Save the model.

`forward(batch: Any) -> tuple[torch.Tensor, dict[str, torch.Tensor]]` `abstractmethod`

Forward pass for the model to calculate loss.

`get_predictions(batch: Any) -> tuple[list[str] | list[list[str]], list[str] | list[list[str]]]` `abstractmethod`

Get the predictions for a batch.

`convert_interval_to_steps(interval: float | int, steps_per_epoch: int) -> int` `staticmethod`

Convert an interval to steps.

PARAMETER	DESCRIPTION
`interval`	The interval to convert. TYPE: `float \| int`
`steps_per_epoch`	The number of steps per epoch. TYPE: `int`

RETURNS	DESCRIPTION
`int`	The number of steps. TYPE: `int`

`log_if_verbose(message: str, level: str = 'info') -> None`

Log a message if verbose logging is enabled.

`setup_metrics() -> Metrics`

Setup the metrics.

`setup_accelerator() -> Accelerator`

Setup the accelerator.

`build_dataloaders(train_dataset: Dataset, valid_dataset: Dataset) -> tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader]`

Setup the dataloaders.

`setup_scheduler() -> torch.optim.lr_scheduler.LRScheduler`

Setup the learning rate scheduler.

RETURNS	DESCRIPTION
`LRScheduler`	torch.optim.lr_scheduler.LRScheduler: The learning rate scheduler

`setup_neptune() -> None`

Setup the neptune.

`setup_tensorboard() -> None`

Setup the tensorboard.

`load_datasets() -> tuple[Dataset, Dataset, int, int]`

Load the training and validation datasets.

RETURNS	DESCRIPTION
`tuple[Dataset, Dataset, int, int]`	tuple[SpectrumDataFrame, SpectrumDataFrame]: The training and validation datasets

`print_sample_batch() -> None`

Print a sample batch of the training data.

`save_accelerator_state(is_best_checkpoint: bool = False) -> None`

Save the accelerator state.

`check_if_best_checkpoint() -> bool`

Check if the last validation metric is the best metric.

`load_accelerator_state() -> None`

Load the accelerator state.

`load_model_state() -> None`

Load the model state.

`update_model_state(model_state: dict[str, torch.Tensor], model_config: DictConfig) -> dict[str, torch.Tensor]`

Update the model state.

`update_vocab(model_state: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]`

Update the vocabulary of the model.

`train() -> None`

Train the model.

`prepare_batch(batch: Iterable[Any]) -> Any`

Prepare a batch for training.

Manually move tensors to accelerator.device since we do not prepare our dataloaders with the accelerator.

PARAMETER	DESCRIPTION
`batch`	The batch to prepare. TYPE: `Iterable[Any]`

RETURNS	DESCRIPTION
`Any`	The prepared batch TYPE: `Any`

`train_epoch() -> None`

Train the model for one epoch.

`validate_epoch(num_sanity_steps: int | None = None, calculate_metrics: bool = True) -> None`

Validate for one epoch.

`NeptuneSummaryWriter(log_dir: str, run: neptune.Run)`

Bases: SummaryWriter

Combine SummaryWriter with NeptuneWriter.

`run = run` `instance-attribute`

`add_scalar(tag: str, scalar_value: float, global_step: int | float | None = None) -> None`

Record scalar to tensorboard and Neptune.

`add_text(tag: str, text_string: str, global_step: Optional[int] = None, walltime: Optional[float] = None) -> None`

Record text to tensorboard and Neptune.

`add_hparams(hparam_dict: dict, metric_dict: dict, hparam_domain_discrete: Optional[Dict[str, List[Any]]] = None, run_name: Optional[str] = None, global_step: Optional[int] = None) -> None`

Add a set of hyperparameters to be compared in Neptune as for Tensorboard.

`Timer(total_steps: int | None = None)`

Timer for training and validation.

`start_time = time.time()` `instance-attribute`

`total_steps = total_steps` `instance-attribute`

`current_step = 0` `instance-attribute`

`start() -> None`

Restart the timer.

`step() -> None`

Step the timer.

`get_delta() -> float`

Get the time delta since the timer was started.

`get_eta(current_step: int | None = None) -> float`

Get the estimated time to completion.

`get_total_time() -> float`

Get the total time expected to complete all steps.

`get_rate(current_step: int | None = None) -> float`

Get the rate of steps per second.

`get_step_time(current_step: int | None = None) -> float`

Get the time per step.

`get_time_str() -> str`

Get the time delta since the timer was started.

`get_eta_str(current_step: int | None = None) -> str`

Get the estimated time to completion.

`get_total_time_str() -> str`

Get the total time expected to complete all steps.

`get_rate_str(current_step: int | None = None) -> str`

Get the rate of steps per second.

`get_step_time_rate_str(current_step: int | None = None) -> str`

Get the time per step.

`get_step_time_str(current_step: int | None = None) -> str`

Get the time per step.

`TrainingState()`

Training state for tracking training progress.

This class is used by Accelerate to save and load training state during checkpointing and resuming training runs. It tracks the current epoch and global step of training.

Initialize training state with zeroed counters.

`global_step: int` `property`

Get the current global step.

`epoch: int` `property`

Get the current epoch.

`state_dict() -> dict[str, Any]`

Get the state dictionary for saving.

RETURNS	DESCRIPTION
`dict[str, Any]`	dict[str, Any]: Dictionary containing the current training state.

`load_state_dict(state_dict: dict[str, Any]) -> None`

Load state from a dictionary.

PARAMETER	DESCRIPTION
`state_dict`	Dictionary containing the training state to load. TYPE: `dict[str, Any]`

`step() -> None`

Step the global step.

`step_epoch() -> None`

Step the epoch.

`unstep_epoch() -> None`

Unstep the epoch.

Index

common

__all__ = ['DataProcessor', 'AccelerateDeNovoTrainer', 'AccelerateDeNovoPredictor', 'FinetuneScheduler', 'WarmupScheduler', 'CosineWarmupScheduler', 'NeptuneSummaryWriter', 'TrainingState', 'Timer'] module-attribute

DataProcessor(metadata_columns: list[str] | set[str] | None = None)

metadata_columns: set[str] property

process_row(row: dict[str, Any]) -> dict[str, Any] abstractmethod

process_dataset(dataset: Dataset, return_format: str | None = 'torch') -> Dataset

collate_fn(batch: list[dict[str, Any]]) -> dict[str, Any]

get_expected_columns() -> list[str]

add_metadata_columns(columns: list[str] | set[str]) -> None

remove_modifications(peptide: str, replace_isoleucine_with_leucine: bool = True) -> str staticmethod

AccelerateDeNovoPredictor(config: DictConfig)

s3: S3FileHandler property

config = config instance-attribute

targets: list | None = None instance-attribute

output_path = self.config.get('output_path', None) instance-attribute

pred_df: pd.DataFrame | None = None instance-attribute

results_dict: dict | None = None instance-attribute

prediction_tokenised_col = self.config.get('prediction_tokenised_col', 'predictions_tokenised') instance-attribute

prediction_col = self.config.get('prediction_col', 'predictions') instance-attribute

log_probs_col = self.config.get('log_probs_col', 'log_probs') instance-attribute

token_log_probs_col = self.config.get('token_log_probs_col', 'token_log_probs') instance-attribute

save_encoder_outputs = config.get('save_encoder_outputs', False) instance-attribute

encoder_output_path = config.get('encoder_output_path', None) instance-attribute

encoder_output_reduction = config.get('encoder_output_reduction', 'mean') instance-attribute

accelerator = self.setup_accelerator() instance-attribute

denovo = self.config.get('denovo', False) instance-attribute

model = self.model.eval() instance-attribute

residue_set = self.model.residue_set instance-attribute

test_dataset = self.load_dataset() instance-attribute

test_dataloader = self.build_dataloader(self.test_dataset) instance-attribute

decoder = self.setup_decoder() instance-attribute

metrics = self.setup_metrics() instance-attribute

running_loss = None instance-attribute

steps_per_inference = len(self.test_dataloader) instance-attribute

load_model() -> Tuple[nn.Module, DictConfig] abstractmethod

setup_decoder() -> Decoder abstractmethod

setup_data_processor() -> DataProcessor abstractmethod

get_predictions(batch: Any) -> dict[str, Any] abstractmethod

postprocess_dataset(dataset: Dataset) -> Dataset

load_dataset() -> Dataset

print_sample_batch() -> None

setup_metrics() -> Metrics

setup_accelerator() -> Accelerator

build_dataloader(test_dataset: Dataset) -> torch.utils.data.DataLoader

predict() -> pd.DataFrame

predictions_to_df(predictions: dict[str, list]) -> pd.DataFrame

postprocess_predictions(pred_df: pd.DataFrame) -> pd.DataFrame

calculate_metrics(pred_df: pd.DataFrame) -> dict[str, Any] | None

save_predictions(pred_df: pd.DataFrame, results_dict: dict[str, list] | None = None) -> None

save_encoder_outputs_to_parquet(spectrum_ids: list[str], encoder_outputs: list[np.ndarray]) -> None

CosineWarmupScheduler(optimizer: torch.optim.Optimizer, warmup: int, max_iters: int)

Parameters

get_lr() -> list[float]

get_lr_factor(epoch: int) -> float

FinetuneScheduler(model_state_dict: dict, config: DictConfig, steps_per_epoch: int | None = None)

model_state_dict = model_state_dict instance-attribute

config = config instance-attribute

steps_per_epoch = steps_per_epoch instance-attribute

is_verbose = self.config.get('verbose', False) instance-attribute

schedule = self._get_schedule() instance-attribute

next_phase: dict[str, Any] | None = self.schedule.pop(0) instance-attribute

step(global_step: int) -> None

WarmupScheduler(optimizer: torch.optim.Optimizer, warmup: int)

warmup = warmup instance-attribute

get_lr() -> list[float]

get_lr_factor(epoch: int) -> float

AccelerateDeNovoTrainer(config: DictConfig)

run_id: str property

s3: S3FileHandler property

global_step: int property

epoch: int property

training_state: TrainingState property

config = config instance-attribute

enable_verbose_logging = self.config.get('enable_verbose_logging', True) instance-attribute

accelerator = self.setup_accelerator() instance-attribute

residue_set = ResidueSet(residue_masses=(self.config.residues.get('residues')), residue_remapping=(self.config.dataset.get('residue_remapping', None))) instance-attribute

model = self.setup_model() instance-attribute

optimizer = self.setup_optimizer() instance-attribute

lr_scheduler = self.setup_scheduler() instance-attribute

`common`

`all = ['DataProcessor', 'AccelerateDeNovoTrainer', 'AccelerateDeNovoPredictor', 'FinetuneScheduler', 'WarmupScheduler', 'CosineWarmupScheduler', 'NeptuneSummaryWriter', 'TrainingState', 'Timer']` `module-attribute`

`DataProcessor(metadata_columns: list[str] | set[str] | None = None)`

`metadata_columns: set[str]` `property`

`process_row(row: dict[str, Any]) -> dict[str, Any]` `abstractmethod`

`process_dataset(dataset: Dataset, return_format: str | None = 'torch') -> Dataset`

`collate_fn(batch: list[dict[str, Any]]) -> dict[str, Any]`

`get_expected_columns() -> list[str]`

`add_metadata_columns(columns: list[str] | set[str]) -> None`

`remove_modifications(peptide: str, replace_isoleucine_with_leucine: bool = True) -> str` `staticmethod`

`AccelerateDeNovoPredictor(config: DictConfig)`

`s3: S3FileHandler` `property`

`config = config` `instance-attribute`

`targets: list | None = None` `instance-attribute`

`output_path = self.config.get('output_path', None)` `instance-attribute`

`pred_df: pd.DataFrame | None = None` `instance-attribute`

`results_dict: dict | None = None` `instance-attribute`

`prediction_tokenised_col = self.config.get('prediction_tokenised_col', 'predictions_tokenised')` `instance-attribute`

`prediction_col = self.config.get('prediction_col', 'predictions')` `instance-attribute`

`log_probs_col = self.config.get('log_probs_col', 'log_probs')` `instance-attribute`

`token_log_probs_col = self.config.get('token_log_probs_col', 'token_log_probs')` `instance-attribute`

`save_encoder_outputs = config.get('save_encoder_outputs', False)` `instance-attribute`

`encoder_output_path = config.get('encoder_output_path', None)` `instance-attribute`

`encoder_output_reduction = config.get('encoder_output_reduction', 'mean')` `instance-attribute`

`accelerator = self.setup_accelerator()` `instance-attribute`

`denovo = self.config.get('denovo', False)` `instance-attribute`

`model = self.model.eval()` `instance-attribute`

`residue_set = self.model.residue_set` `instance-attribute`

`test_dataset = self.load_dataset()` `instance-attribute`

`test_dataloader = self.build_dataloader(self.test_dataset)` `instance-attribute`

`decoder = self.setup_decoder()` `instance-attribute`

`metrics = self.setup_metrics()` `instance-attribute`

`running_loss = None` `instance-attribute`

`steps_per_inference = len(self.test_dataloader)` `instance-attribute`

`load_model() -> Tuple[nn.Module, DictConfig]` `abstractmethod`

`setup_decoder() -> Decoder` `abstractmethod`

`setup_data_processor() -> DataProcessor` `abstractmethod`

`get_predictions(batch: Any) -> dict[str, Any]` `abstractmethod`

`postprocess_dataset(dataset: Dataset) -> Dataset`

`load_dataset() -> Dataset`

`print_sample_batch() -> None`

`setup_metrics() -> Metrics`

`setup_accelerator() -> Accelerator`

`build_dataloader(test_dataset: Dataset) -> torch.utils.data.DataLoader`

`predict() -> pd.DataFrame`

`predictions_to_df(predictions: dict[str, list]) -> pd.DataFrame`

`postprocess_predictions(pred_df: pd.DataFrame) -> pd.DataFrame`

`calculate_metrics(pred_df: pd.DataFrame) -> dict[str, Any] | None`

`save_predictions(pred_df: pd.DataFrame, results_dict: dict[str, list] | None = None) -> None`

`save_encoder_outputs_to_parquet(spectrum_ids: list[str], encoder_outputs: list[np.ndarray]) -> None`

`CosineWarmupScheduler(optimizer: torch.optim.Optimizer, warmup: int, max_iters: int)`

`get_lr() -> list[float]`

`get_lr_factor(epoch: int) -> float`

`FinetuneScheduler(model_state_dict: dict, config: DictConfig, steps_per_epoch: int | None = None)`

`model_state_dict = model_state_dict` `instance-attribute`

`config = config` `instance-attribute`

`steps_per_epoch = steps_per_epoch` `instance-attribute`

`is_verbose = self.config.get('verbose', False)` `instance-attribute`

`schedule = self._get_schedule()` `instance-attribute`

`next_phase: dict[str, Any] | None = self.schedule.pop(0)` `instance-attribute`

`step(global_step: int) -> None`

`WarmupScheduler(optimizer: torch.optim.Optimizer, warmup: int)`

`warmup = warmup` `instance-attribute`

`get_lr() -> list[float]`

`get_lr_factor(epoch: int) -> float`

`AccelerateDeNovoTrainer(config: DictConfig)`

`run_id: str` `property`

`s3: S3FileHandler` `property`

`global_step: int` `property`

`epoch: int` `property`

`training_state: TrainingState` `property`

`config = config` `instance-attribute`

`enable_verbose_logging = self.config.get('enable_verbose_logging', True)` `instance-attribute`

`accelerator = self.setup_accelerator()` `instance-attribute`

`residue_set = ResidueSet(residue_masses=(self.config.residues.get('residues')), residue_remapping=(self.config.dataset.get('residue_remapping', None)))` `instance-attribute`

`model = self.setup_model()` `instance-attribute`

`optimizer = self.setup_optimizer()` `instance-attribute`

`lr_scheduler = self.setup_scheduler()` `instance-attribute`

`decoder = self.setup_decoder()` `instance-attribute`

`metrics = self.setup_metrics()` `instance-attribute`