Dataset Configs¶
- class mlip.data.configs.ChemicalSystemsReaderConfig(*, train_dataset_paths: str | Path | list[str | Path], valid_dataset_paths: str | Path | list[str | Path] | None = None, test_dataset_paths: str | Path | list[str | Path] | None = None, train_num_to_load: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None, valid_num_to_load: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None, test_num_to_load: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None)¶
Pydantic-based config related to data preprocessing and loading into `ChemicalSystem`s.
- train_dataset_paths¶
Path(s) to where the training set(s) are located. Cannot be empty. Will be converted to a list after validation.
- Type:
str | pathlib.Path | list[str | pathlib.Path]
- valid_dataset_paths¶
Path(s) to where the validation set(s) are located. This can be empty. Will be converted to a list after validation.
- Type:
str | pathlib.Path | list[str | pathlib.Path] | None
- test_dataset_paths¶
Path(s) to where the test set(s) are located. This can be empty. Will be converted to a list after validation.
- Type:
str | pathlib.Path | list[str | pathlib.Path] | None
- train_num_to_load¶
Number of training set data points to load from the given dataset. By default, this is
None
which means all the data points are loaded. If multiple dataset paths are given, then this limit will apply to each path separately, not in total.- Type:
int | None
- valid_num_to_load¶
Number of validation set data points to load from the given dataset. By default, this is
None
which means all the data points are loaded. If multiple dataset paths are given, then this limit will apply to each path separately, not in total.- Type:
int | None
- test_num_to_load¶
Number of test set data points to load from the given dataset. By default, this is
None
which means all the data points are loaded. If multiple dataset paths are given, then this limit will apply to each path separately, not in total.- Type:
int | None
- class mlip.data.configs.GraphDatasetBuilderConfig(*, graph_cutoff_angstrom: Annotated[float, Gt(gt=0)] = 5.0, max_n_node: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None, max_n_edge: Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Gt(gt=0)])] | None = None, batch_size: Annotated[int, Gt(gt=0)] = 16, num_batch_prefetch: Annotated[int, Gt(gt=0)] = 1, batch_prefetch_num_devices: Annotated[int, Gt(gt=0)] = 1, use_formation_energies: bool = False, avg_num_neighbors: float | None = None, avg_r_min_angstrom: float | None = None)¶
Pydantic-based config related to graph dataset building and preprocessing.
- graph_cutoff_angstrom¶
Graph cutoff distance in Angstrom to apply when creating the graphs. Default is 5.0.
- Type:
float
- max_n_node¶
This value will be multiplied with the batch size to determine the maximum number of nodes we allow in a batch. Note that a batch will always contain max_n_node * batch_size nodes, as the remaining ones are filled up with dummy nodes. If set to
None
, a reasonable value will be automatically computed. Default isNone
.- Type:
int | None
- max_n_edge¶
This value will be multiplied with the batch size to determine the maximum number of edges we allow in a batch. Note that a batch will always contain max_n_edge * batch_size edges, as the remaining ones are filled up with dummy edges. If set to
None
, a reasonable value will be automatically computed. Default isNone
.- Type:
int | None
- batch_size¶
The number of graphs in a batch. Will be filled up with dummy graphs if either the maximum number of nodes or edges are reached before the number of graphs is reached. Default is 16.
- Type:
int
- num_batch_prefetch¶
Number of batched graphs to prefetch while iterating over batches. Default is 1.
- Type:
int
- batch_prefetch_num_devices¶
Number of threads to use for prefetching. Default is 1.
- Type:
int
- use_formation_energies¶
Whether the energies in the dataset should already be transformed to subtract the average atomic energies. Default is
False
. Make sure that if you set this toTrue
, the models assume"zero"
atomic energies as can be set in the model hyperparameters.- Type:
bool
- avg_num_neighbors¶
The pre-computed average number of neighbors.
- Type:
float | None
- avg_r_min_angstrom¶
The pre-computed average minimum distance between nodes.
- Type:
float | None