Graph Dataset Builder¶

class mlip.data.graph_dataset_builder.GraphDatasetBuilder(reader: ChemicalSystemsReader | CombinedReader, dataset_config: GraphDatasetBuilderConfig)¶

Main class handling the construction and preprocessing of the graph dataset.

The key idea is that a user provides a ChemicalSystemsReader subclass that loads a dataset from disk into ChemicalSystem dataclasses and then GraphDatasetBuilder converts these further to jraph graphs and the dataset info dataclass.

__init__(reader: ChemicalSystemsReader | CombinedReader, dataset_config: GraphDatasetBuilderConfig)¶

Constructor.

Parameters:

reader – The data reader that loads a dataset into ChemicalSystem dataclasses
dataset_config – The pydantic config.

prepare_datasets() → None¶

Prepares the datasets.

This includes loading it into ChemicalSystem objects via the chemical systems reader, and then producing the graph datasets and the dataset info object.

get_splits(prefetch: bool = False, devices: list[Device] | None = None) → tuple[GraphDataset, GraphDataset, GraphDataset] | tuple[PrefetchIterator, PrefetchIterator, PrefetchIterator]¶

Returns the training, validation, and test dataset splits.

Parameters:

prefetch – Whether to run the data prefetching and return PrefetchIterators.
devices – Devices for parallel prefetching. Must be given if prefetch=True.

Returns:

A tuple of training, validation, and test datasets. If prefetch=False, these are of type GraphDataset, otherwise of type PrefetchIterator.

property dataset_info: DatasetInfo¶

Getter for the dataset info.

Will raise exception if dataset info not available yet.