Helper functions¶
- mlip.graph.neighborhood.get_neighborhood(positions: ndarray, cutoff: float, pbc: tuple[bool, bool, bool] | None = None, cell: ndarray | None = None) tuple[ndarray, ndarray, ndarray]¶
Computes the edge information for a given set of positions, including senders, receivers, and shift vectors.
If
pbcisNoneor(False, False, False), then the shifts will be returned as zero. This is the default behavior. The cell is None as default and as a result, matscipy will compute the minimal cell size needed to fit the whole system. See matscipy’s documentation for more information.- Parameters:
positions – The position matrix.
cutoff – The distance cutoff for the edges in Angstrom.
pbc – A tuple of bools representing if periodic boundary conditions exist in any of the spatial dimensions. Default is None, which means False in every direction.
cell – The unit cell of the system given as a 3x3 matrix or as None (default), which means that matscipy will compute the minimal cell size needed to fit the whole system.
- Returns:
A tuple of senders (starting indexes of atoms for each edge), receivers (ending indexes of atoms for each edge), and shifts (the shift vectors, see matscipy’s documentation for more information. If PBCs are false, then we return shifts of zero).
- mlip.data.helpers.dynamically_batch.dynamically_batch(graphs_iterator: Iterable[Graph], n_node: int, n_edge: int, n_graph: int, n_edge_long_range: int | None = None, pad_fn: Callable[[Graph], Graph] = None, skip_last_batch: bool = False) Generator[Graph, None, None]¶
Dynamically batches trees with
Graphsup to specified sizes.Elements of the
graphs_iteratorwill be incrementally added to a batch until the limits defined byn_node,n_edgeandn_graphare reached. This means each element yielded by this generator may have a differing number of graphs in its batch.- Parameters:
graphs_iterator – An iterator of
Graph.n_node – The maximum number of nodes in a batch, at least the maximum sized graph + 1.
n_edge – The maximum number of edges in a batch, at least the maximum sized graph.
n_graph – The maximum number of graphs in a batch, at least 2.
n_edge_long_range – The maximum number of long-range edges in a batch. Set to
None(default) when the graphs do not carry a long-range neighbor list; otherwise must be at least the maximum sized graph’s long-range edge count.pad_fn – A function for padding. If
None(default), then use the standardpad_with_graphs.skip_last_batch – Whether to skip the last batch. The default is false.
- Yields:
A
Graphbatch of graphs.- Raises:
ValueError – if the number of graphs is < 2.
RuntimeError – if the
graphs_iteratorcontains elements which are notGraphobjects.RuntimeError – if a graph is found which is larger than the batch size.
- mlip.data.helpers.dummy_init_graph.get_dummy_graph_for_model_init() Graph¶
Generates a simple dummy graph that can be used for model initialization.
- Returns:
The dummy graph.
- mlip.data.helpers.atomic_energies.compute_average_e0s_from_graphs(graphs: list[Graph]) dict[int, float]¶
Compute average energy contribution of each element by least squares.
- Parameters:
graphs – The graphs for which to compute the average energy contribution of each element.
- Returns:
A dictionary mapping atomic number to the average energy contribution of that element.
- mlip.data.helpers.neighbor_analysis.compute_avg_num_neighbors(graphs: list[Graph]) float¶
Computes the averages number of neighbors for a given list of graphs.
- Parameters:
graphs – The list of graphs to process.
- Returns:
The average (i.e., mean) number of neighbors.
- mlip.data.helpers.neighbor_analysis.compute_avg_min_neighbor_distance(graphs: list[Graph]) float¶
Computes the average minimum neighbor distance for a given list of graphs.
- Parameters:
graphs – The list of graphs to process.
- Returns:
The average (i.e., mean) minimum neighbor distance.
- mlip.data.helpers.hessian_utils.get_hessian_processing_functions() tuple[Callable[[list[ChemicalSystem]], list[ChemicalSystem]], Callable[[Graph], Graph]]¶
Return preprocessing and postprocessing functions for Hessian labels.
The first function
pad_systems_hessians()operates on chemical systems prior to graph construction (e.g., padding Hessians), while the second function,process_graph_hessian()processes Hessian labels after graph objects have been created.- Returns:
A tuple containing a systems-level preprocessing function (applied before graph creation) and a graph-level postprocessing function (applied after graph creation).
- mlip.data.helpers.hessian_utils.pad_systems_hessians(systems: list[ChemicalSystem]) list[ChemicalSystem]¶
Pad the Hessian of each system in the given systems list to
(n,3,N,3), wherenis the number of atoms in the system, andNis the number of atoms in the largest system in the list.
- mlip.data.helpers.hessian_utils.process_graph_hessian(batched_graph: Graph, num_rows: int) Graph¶
Process Hessian labels to match the format of Hessian predictions.
First, rows are subsampled from graph-wise Hessians into a batch of shape
(R,G,N,3)whereR = num_rows,G = n_graphandN = max_system_size. Hessians are processed, cropped then permuted into shape(n, R, 3), wherenis maximum possible number of graph nodes in a batch, ensuring the processed Hessian label shape is static.- Parameters:
batched_graph – batch of graphs with full, padded Hessian matrices.
num_rows – number of Hessian rows to be subsampled.
- Returns:
A batched graph with processed subsampled Hessian labels.