Dataset Info¶
- class mlip.data.dataset_info.DatasetInfo(*, atomic_energies_map: dict[int, float], cutoff_distance_angstrom: float, avg_num_neighbors: float = 1.0, avg_r_min_angstrom: float | None = None, scaling_mean: float = 0.0, scaling_stdev: float = 1.0)¶
Pydantic dataclass holding information computed from the dataset that is (potentially) required by the models.
- atomic_energies_map¶
A dictionary mapping the atomic numbers to the computed average atomic energies for that element.
- Type:
dict[int, float]
- cutoff_distance_angstrom¶
The graph cutoff distance that was used in the dataset in Angstrom.
- Type:
float
- avg_num_neighbors¶
The mean number of neighbors an atom has in the dataset.
- Type:
float
- avg_r_min_angstrom¶
The mean minimum edge distance for a structure in the dataset.
- Type:
float | None
- scaling_mean¶
The mean used for the rescaling of the dataset values, the default being 0.0.
- Type:
float
- scaling_stdev¶
The standard deviation used for the rescaling of the dataset values, the default being 1.0.
- Type:
float
- __init__(**data: Any) None ¶
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError
][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.self
is explicitly positional-only to allowself
as a field name.
- mlip.data.dataset_info.compute_dataset_info_from_graphs(graphs: list[GraphsTuple], cutoff_distance_angstrom: float, z_table: AtomicNumberTable, avg_num_neighbors: float | None = None, avg_r_min_angstrom: float | None = None) DatasetInfo ¶
Computes the dataset info from graphs, typically training set graphs.
- Parameters:
graphs – The graphs.
cutoff_distance_angstrom – The graph distance cutoff in Angstrom to store in the dataset info.
z_table – The atomic numbers table needed to produce the correct atomic energies map keys.
avg_num_neighbors – The optionally pre-computed average number of neighbors. If provided, we skip recomputing this.
avg_r_min_angstrom – The optionally pre-computed average miminum radius. If provided, we skip recomputing this.
- Returns:
The dataset info object populated with the computed data.