Helper functions

mlip.graph.neighborhood.get_neighborhood(positions: ndarray, cutoff: float, pbc: tuple[bool, bool, bool] | None = None, cell: ndarray | None = None) tuple[ndarray, ndarray, ndarray]

Computes the edge information for a given set of positions, including senders, receivers, and shift vectors.

If pbc is None or (False, False, False), then the shifts will be returned as zero. This is the default behavior. The cell is None as default and as a result, matscipy will compute the minimal cell size needed to fit the whole system. See matscipy’s documentation for more information.

Parameters:
  • positions – The position matrix.

  • cutoff – The distance cutoff for the edges in Angstrom.

  • pbc – A tuple of bools representing if periodic boundary conditions exist in any of the spatial dimensions. Default is None, which means False in every direction.

  • cell – The unit cell of the system given as a 3x3 matrix or as None (default), which means that matscipy will compute the minimal cell size needed to fit the whole system.

Returns:

A tuple of senders (starting indexes of atoms for each edge), receivers (ending indexes of atoms for each edge), and shifts (the shift vectors, see matscipy’s documentation for more information. If PBCs are false, then we return shifts of zero).

mlip.data.helpers.dynamically_batch.dynamically_batch(graphs_iterator: Iterable[Graph], n_node: int, n_edge: int, n_graph: int, n_edge_long_range: int | None = None, pad_fn: Callable[[Graph], Graph] = None, skip_last_batch: bool = False) Generator[Graph, None, None]

Dynamically batches trees with Graphs up to specified sizes.

Elements of the graphs_iterator will be incrementally added to a batch until the limits defined by n_node, n_edge and n_graph are reached. This means each element yielded by this generator may have a differing number of graphs in its batch.

Parameters:
  • graphs_iterator – An iterator of Graph.

  • n_node – The maximum number of nodes in a batch, at least the maximum sized graph + 1.

  • n_edge – The maximum number of edges in a batch, at least the maximum sized graph.

  • n_graph – The maximum number of graphs in a batch, at least 2.

  • n_edge_long_range – The maximum number of long-range edges in a batch. Set to None (default) when the graphs do not carry a long-range neighbor list; otherwise must be at least the maximum sized graph’s long-range edge count.

  • pad_fn – A function for padding. If None (default), then use the standard pad_with_graphs.

  • skip_last_batch – Whether to skip the last batch. The default is false.

Yields:

A Graph batch of graphs.

Raises:
  • ValueError – if the number of graphs is < 2.

  • RuntimeError – if the graphs_iterator contains elements which are not Graph objects.

  • RuntimeError – if a graph is found which is larger than the batch size.

mlip.data.helpers.dummy_init_graph.get_dummy_graph_for_model_init() Graph

Generates a simple dummy graph that can be used for model initialization.

Returns:

The dummy graph.

mlip.data.helpers.atomic_energies.compute_average_e0s_from_graphs(graphs: list[Graph]) dict[int, float]

Compute average energy contribution of each element by least squares.

Parameters:

graphs – The graphs for which to compute the average energy contribution of each element.

Returns:

A dictionary mapping atomic number to the average energy contribution of that element.

mlip.data.helpers.neighbor_analysis.compute_avg_num_neighbors(graphs: list[Graph]) float

Computes the averages number of neighbors for a given list of graphs.

Parameters:

graphs – The list of graphs to process.

Returns:

The average (i.e., mean) number of neighbors.

mlip.data.helpers.neighbor_analysis.compute_avg_min_neighbor_distance(graphs: list[Graph]) float

Computes the average minimum neighbor distance for a given list of graphs.

Parameters:

graphs – The list of graphs to process.

Returns:

The average (i.e., mean) minimum neighbor distance.

mlip.data.helpers.hessian_utils.get_hessian_processing_functions() tuple[Callable[[list[ChemicalSystem]], list[ChemicalSystem]], Callable[[Graph], Graph]]

Return preprocessing and postprocessing functions for Hessian labels.

The first function pad_systems_hessians() operates on chemical systems prior to graph construction (e.g., padding Hessians), while the second function, process_graph_hessian() processes Hessian labels after graph objects have been created.

Returns:

A tuple containing a systems-level preprocessing function (applied before graph creation) and a graph-level postprocessing function (applied after graph creation).

mlip.data.helpers.hessian_utils.pad_systems_hessians(systems: list[ChemicalSystem]) list[ChemicalSystem]

Pad the Hessian of each system in the given systems list to (n,3,N,3), where n is the number of atoms in the system, and N is the number of atoms in the largest system in the list.

mlip.data.helpers.hessian_utils.process_graph_hessian(batched_graph: Graph, num_rows: int) Graph

Process Hessian labels to match the format of Hessian predictions.

First, rows are subsampled from graph-wise Hessians into a batch of shape (R,G,N,3) where R = num_rows, G = n_graph and N = max_system_size. Hessians are processed, cropped then permuted into shape (n, R, 3), where n is maximum possible number of graph nodes in a batch, ensuring the processed Hessian label shape is static.

Parameters:
  • batched_graph – batch of graphs with full, padded Hessian matrices.

  • num_rows – number of Hessian rows to be subsampled.

Returns:

A batched graph with processed subsampled Hessian labels.