HDF5 Reader¶
This reader expects the data to be in HDF5 format and organized in the following way. The data must be defined as groups by the structure name. The scalar properties will be stored as attributes to the group and the array properties as arrays. Below, we provide an example of how to read the data from such a compliant HDF5 file to demonstrate how the data is organized:
with h5py.File(hdf5_dataset_path, "r") as h5file:
# Get the identifiers for all structures in the dataset
struct_names = list(h5file.keys())
# Just loading the first one for the sake of an example
structure = h5file[struct_names[0]]
positions = structure["positions"][:]
element_numbers = structure["elements"][:]
forces = structure["forces"][:]
# Stress could be optional if not needed during training
if "stress" in structure:
stress = structure["stress"][:]
# Hessian is only required to train a `HessianPredictor`
if "hessian" in structure:
hessian = structure["hessian"][:]
# Energy is a scalar
energy = structure.attrs["energy"]
See below for the API reference to the associated loader class.
- class mlip.data.chemical_systems_readers.hdf5_reader.Hdf5Reader(filepaths: str | PathLike | list[str | PathLike], data_download_fun: Callable[[str | PathLike, str | PathLike], None] | None = None, num_to_load: int | None = None, property_name_mapping: dict[str, str] | None = None)¶
Implementation of a chemical systems reader that loads data from hdf5 format.
- __init__(filepaths: str | PathLike | list[str | PathLike], data_download_fun: Callable[[str | PathLike, str | PathLike], None] | None = None, num_to_load: int | None = None, property_name_mapping: dict[str, str] | None = None)¶
Constructor.
- Parameters:
filepaths – Path or paths to file from which ChemicalSystem objects will be read.
data_download_fun – Optional function to download the data from
filepath(source) to a local target path.num_to_load – Optional limit on the number of systems to load per file. If
None, all systems are loaded.property_name_mapping – Optional mapping from canonical names (
"forces","energy","stress") to the keys used in the data files. By default, it will be mapped to the same names. Any entries provided will override the corresponding defaults.
- load() list[ChemicalSystem]¶
Load chemical systems from all HDF5 filepaths.