Single Graph Dataset Builder¶
- class mlip.data.single_graph_dataset_builder.SingleGraphDatasetBuilder(readers: ChemicalSystemsReader | list[ChemicalSystemsReader], builder_config: GraphDatasetBuilderConfig, dataset_info: DatasetInfo | bool, shuffle: bool = False)¶
Builds a single
GraphDatasetfrom one or more readers.Handles loading chemical systems, converting them to graphs, auto-filling batch dimensions, and optionally computing
DatasetInfo.- __init__(readers: ChemicalSystemsReader | list[ChemicalSystemsReader], builder_config: GraphDatasetBuilderConfig, dataset_info: DatasetInfo | bool, shuffle: bool = False)¶
Constructor.
- Parameters:
readers – The data reader(s) that load a dataset into
ChemicalSystemdataclasses.builder_config – The pydantic config.
dataset_info – Pass
Trueto compute dataset info from the graphs. PassFalseto skip dataset info computation. Pass aDatasetInfoinstance to use a pre-computed one (e.g. from a trained model).
- get_dataset(prefetch: bool = False, mesh: Mesh | None = None, systems_preprocessing: list[Callable[[list[ChemicalSystem]], list[ChemicalSystem]]] | None = None, graph_postprocessing: list[Callable[[Graph], Graph]] | None = None) GraphDataset | PrefetchIterator¶
Build and return the dataset.
Loads systems, converts to graphs, builds a
GraphDataset, optionally computesDatasetInfo, and wraps in a prefetch iterator if requested.- Parameters:
prefetch – Whether to wrap the dataset in a
PrefetchIterator. By default, this is set toFalse.mesh – Device mesh for data parallelism. If
NoneandprefetchisTrue, a default mesh is created.systems_preprocessing – Optional list of functions applied sequentially to the loaded chemical systems.
graph_postprocessing – Optional list of batch-level post-processing functions passed to
GraphDataset.
- Returns:
A
GraphDatasetorPrefetchIterator.
- property dataset_info: DatasetInfo | None¶
The computed or preset
DatasetInfo, orNoneif dataset info computation was disabled.Returns immediately when dataset info was disabled (
False) or was provided as a preset — in those casesget_dataset()does not need to have been called.- Raises:
DatasetsHaveNotBeenProcessedError – If dataset info computation was requested (
dataset_info=True) butget_dataset()has not been called yet.