Extxyz Reader¶
The features in the extxyz format are defined in the properties on the comment lines of a concatenated XYZ file. Each structure starts with the number of atoms, followed by the comment line and then the elements, positions, and forces as specified in the properties.
Multiple structures are concatenated together, hence the whole set of training structures can be in just one file, the validation structures in another, and the test structures in a third file.
Here’s a shortened example of the training data in the extxyz format:
21
Properties=species:S:1:pos:R:3:forces:R:3 energy=-17617.63598758549 pbc="F F F"
C 2.03112297 -1.10783801 -0.35158800 2.72979276 -1.55877755 -0.63202814
C 0.68817554 0.94896126 -1.72487641 -0.92555477 -0.52119051 2.26082812
C 2.47575017 -0.65064361 -1.63039847 1.67313734 -3.78441218 1.72467687
...
For loading the extxyz file, we internally use ase.io.read
from the
ASE library.
See below for the API reference to the associated loader class.
- class mlip.data.chemical_systems_readers.extxyz_reader.ExtxyzReader(config: ChemicalSystemsReaderConfig, data_download_fun: Callable[[str | PathLike, str | PathLike], None] | None = None)¶
Implementation of a chemical systems reader that loads data from extxyz format via the
ase
library.- __init__(config: ChemicalSystemsReaderConfig, data_download_fun: Callable[[str | PathLike, str | PathLike], None] | None = None)¶
Constructor.
- Parameters:
config – The configuration defining how and where to load the data from.
data_download_fun – A function to download data from an external remote system. If
None
(default), then this class assumes file paths are local. This function must take two paths as input, source and target, and download the data at source into the target location.
- load(postprocess_fun: ~typing.Callable[[list[~mlip.data.chemical_system.ChemicalSystem], list[~mlip.data.chemical_system.ChemicalSystem], list[~mlip.data.chemical_system.ChemicalSystem]], tuple[list[~mlip.data.chemical_system.ChemicalSystem], list[~mlip.data.chemical_system.ChemicalSystem], list[~mlip.data.chemical_system.ChemicalSystem]]] | None = <function filter_systems_with_unseen_atoms_and_assign_atomic_species>) tuple[list[ChemicalSystem], list[ChemicalSystem], list[ChemicalSystem]] ¶
Loads the dataset into its internal format.
- Parameters:
postprocess_fun – Function to call to postprocess the loaded dataset before returning it. Accepts train, validation and test systems (
list[ChemicalSystems]
), runs some postprocessing (filtering for example) and returns the postprocessed train, validation and test systems. Ifpostprocess_fun
isNone
then no postprocessing will be done. By default, it will runassign_atomic_species_and_filter_systems_with_unseen_atoms()
which assigns atomic species onChemicalSystem
objects and filters out systems from the validation and test sets that contain chemical elements that are not present in the train systems.- Returns:
A tuple of loaded training, validation and test datasets (in this order). The internal format is a list of
ChemicalSystem
objects.