Reference: Prediction Output
When you run predictions with InstaNovo and specify an output path, a CSV file is generated. This document describes the columns in that file.
Standard Columns
| Column | Description | Data Type | Notes |
|---|---|---|---|
| experiment_name | Experiment name derived from input filename | String | Based on the input file name (mgf, mzml, or mzxml) |
| scan_number | Scan number of the MS/MS spectrum | Integer | Unique identifier from the input file |
| spectrum_id | Unique spectrum identifier | String | Combination of experiment name and scan number (e.g., yeast:17738) |
| precursor_mz | Precursor m/z (mass-to-charge ratio) | Float | The observed m/z of the precursor ion |
| precursor_charge | Precursor charge state | Integer | Charge state of the precursor ion |
| prediction_id | Unique prediction identifier | String | Internal identifier for the prediction |
| group | Data group identifier | String | Used when running predictions on grouped data |
| targets | Target peptide sequence | String | Ground truth peptide sequence (only present if running in evaluation mode) |
| predictions | Best predicted peptide sequence | String | The final predicted peptide sequence (from InstaNovo+ when using refinement) |
| predictions_tokenised | Best predicted peptide sequence (tokenised) | String | The predicted sequence as comma-separated tokens |
| log_probs | Log probability of the best predicted sequence | Float | Natural logarithm of the sequence confidence. Higher is better. |
| token_log_probs | Log probability of each token in the best prediction | List[Float] | Natural logarithm of the confidence for each amino acid in the sequence |
| delta_mass_ppm | Mass difference between precursor and predicted peptide in ppm | Float | The mass deviation in parts per million. Lower is better. |
InstaNovo (Transformer) Model Columns
These columns are present when using InstaNovo+ (combined transformer + diffusion model).
| Column | Description | Data Type | Notes |
|---|---|---|---|
| instanovo_predictions | Predicted peptide sequence from InstaNovo | String | The initial peptide sequence from the transformer |
| instanovo_log_probabilities | Log probability from InstaNovo | Float | Natural logarithm of the sequence confidence |
| instanovo_token_log_probabilities | Token log probabilities from InstaNovo | List[Float] | Natural logarithm of the confidence for each token |
| instanovo_predictions_beam_0-4 | Predicted sequences from each beam | String | Beam search results when num_beams > 1 |
| instanovo_log_probabilities_beam_0-4 | Log probabilities from each beam | Float | Confidence scores for each beam |
| instanovo_token_log_probabilities_beam_0-4 | Token log probabilities from each beam | List[Float] | Per-token confidence for each beam |
InstaNovo+ (Diffusion) Model Columns
These columns are present when using InstaNovo+ (combined transformer + diffusion model).
| Column | Description | Data Type | Notes |
|---|---|---|---|
| diffusion_predictions | Predicted peptide sequence from InstaNovo+ | String | The refined peptide sequence from the diffusion |
| diffusion_log_probabilities | Log probability from InstaNovo+ | Float | Natural logarithm of the sequence confidence |
| diffusion_token_log_probabilities | Token log probabilities from InstaNovo+ | List[Float] | Natural logarithm of the confidence for each token |
| diffusion_unrefined_predictions | Unrefined predictions from InstaNovo+ | String | Predictions before refinement |
| diffusion_predictions_beam_0-4 | Predicted sequences from each beam | String | Beam search results when num_beams > 1 |
| diffusion_log_probabilities_beam_0-4 | Log probabilities from each beam | Float | Confidence scores for each beam |
Usage Notes
- When using InstaNovo+ with refinement, the
predictionscolumn contains the best prediction from the diffusion model. - We recommend filtering the output based on the
diffusion_log_probabilitiesanddelta_mass_ppmcolumns to obtain a set of high-confidence predictions. - Beam search columns (beam_0 through beam_4) are only present when running with
num_beams > 1. - The transformer model columns are prefixed with
instanovo_and diffusion model columns are prefixed withdiffusion_.