Graph Dataset Builder

class mlip.data.graph_dataset_builder.GraphDatasetBuilder(reader: ChemicalSystemsReader | CombinedReader, dataset_config: GraphDatasetBuilderConfig)

Main class handling the construction and preprocessing of the graph dataset.

The key idea is that a user provides a ChemicalSystemsReader subclass that loads a dataset from disk into ChemicalSystem dataclasses and then GraphDatasetBuilder converts these further to jraph graphs and the dataset info dataclass.

__init__(reader: ChemicalSystemsReader | CombinedReader, dataset_config: GraphDatasetBuilderConfig)

Constructor.

Parameters:
  • reader – The data reader that loads a dataset into ChemicalSystem dataclasses

  • dataset_config – The pydantic config.

prepare_datasets() None

Prepares the datasets.

This includes loading it into ChemicalSystem objects via the chemical systems reader, and then producing the graph datasets and the dataset info object.

get_splits(prefetch: bool = False, devices: list[Device] | None = None) tuple[GraphDataset, GraphDataset, GraphDataset] | tuple[PrefetchIterator, PrefetchIterator, PrefetchIterator]

Returns the training, validation, and test dataset splits.

Parameters:
  • prefetch – Whether to run the data prefetching and return PrefetchIterators.

  • devices – Devices for parallel prefetching. Must be given if prefetch=True.

Returns:

A tuple of training, validation, and test datasets. If prefetch=False, these are of type GraphDataset, otherwise of type PrefetchIterator.

property dataset_info: DatasetInfo

Getter for the dataset info.

Will raise exception if dataset info not available yet.