Graph Dataset Builder¶
- class mlip.data.graph_dataset_builder.GraphDatasetBuilder(reader: ChemicalSystemsReader | CombinedReader, dataset_config: GraphDatasetBuilderConfig)¶
Main class handling the construction and preprocessing of the graph dataset.
The key idea is that a user provides a
ChemicalSystemsReader
subclass that loads a dataset from disk intoChemicalSystem
dataclasses and thenGraphDatasetBuilder
converts these further tojraph
graphs and the dataset info dataclass.- __init__(reader: ChemicalSystemsReader | CombinedReader, dataset_config: GraphDatasetBuilderConfig)¶
Constructor.
- Parameters:
reader – The data reader that loads a dataset into
ChemicalSystem
dataclassesdataset_config – The pydantic config.
- prepare_datasets() None ¶
Prepares the datasets.
This includes loading it into ChemicalSystem objects via the chemical systems reader, and then producing the graph datasets and the dataset info object.
- get_splits(prefetch: bool = False, devices: list[Device] | None = None) tuple[GraphDataset, GraphDataset, GraphDataset] | tuple[PrefetchIterator, PrefetchIterator, PrefetchIterator] ¶
Returns the training, validation, and test dataset splits.
- Parameters:
prefetch – Whether to run the data prefetching and return PrefetchIterators.
devices – Devices for parallel prefetching. Must be given if prefetch=True.
- Returns:
A tuple of training, validation, and test datasets. If prefetch=False, these are of type GraphDataset, otherwise of type PrefetchIterator.
- property dataset_info: DatasetInfo¶
Getter for the dataset info.
Will raise exception if dataset info not available yet.