Helper functions¶

mlip.data.helpers.graph_creation.create_graph_from_chemical_system(chemical_system: ChemicalSystem, distance_cutoff_angstrom: float, batch_it_with_minimal_dummy: bool = False) → GraphsTuple¶

Creates a jraph.GraphsTuple object from a chemical system object.

This includes computing the senders/receivers/shifts for the system and otherwise just transferring data 1-to-1 to the graph.

Parameters:

chemical_system – The chemical system object.
distance_cutoff_angstrom – The graph distance cutoff in Angstrom.
batch_it_with_minimal_dummy – Batch the dummy together with a minimal dummy graph of size 1 node and 1 edge. Needed if you want to run a model inference on just this single graph. Default is False.

Returns:

The jraph.GraphsTuple object for the given chemical system.

mlip.data.helpers.neighborhood.get_neighborhood(positions: ndarray, cutoff: float, pbc: tuple[bool, bool, bool] | None = None, cell: ndarray | None = None) → tuple[ndarray, ndarray, ndarray]¶

Computes the edge information for a given set of positions, including senders, receivers, and shift vectors.

If pbc is None or (False, False, False), then the shifts will be returned as zero. This is the default behavior. The cell is None as default and as a result, matscipy will compute the minimal cell size needed to fit the whole system. See matscipy’s documentation for more information.

Parameters:

positions – The position matrix.
cutoff – The distance cutoff for the edges in Angstrom.
pbc – A tuple of bools representing if periodic boundary conditions exist in any of the spatial dimensions. Default is None, which means False in every direction.
cell – The unit cell of the system given as a 3x3 matrix or as None (default), which means that matscipy will compute the minimal cell size needed to fit the whole system.

Returns:

A tuple of senders (starting indexes of atoms for each edge), receivers (ending indexes of atoms for each edge), and shifts (the shift vectors, see matscipy’s documentation for more information. If PBCs are false, then we return shifts of zero).

mlip.data.helpers.edge_vectors.get_edge_relative_vectors(positions: ndarray, senders: ndarray, receivers: ndarray, shifts: ndarray, cell: ndarray | None, n_edge: ndarray) → ndarray¶

Compute the relative edge vectors from senders to receivers.

With PBCs, sender nodes need to be translated from the unit cell to the receiver’s nearest neighbouring cell. See get_edge_vectors() for more details.

# Returns vectors
vectors = positions[receivers] - positions[senders] + shifts @ cell
# From the ASE docs:
D = positions[j] - positions[i] + S.dot(cell)

Parameters:

positions – The positions of the system.
senders – The sender indices of the edges, labelled i by ASE.
receivers – The receiver indices of the edges, labelled j by ASE.
shifts – The shift vectors as returned by the matscipy neighbour lists functionality, and labelled S by ASE.
cell – The unit cells of each graph, an array of shape [n_graph, 3, 3].
n_edge – The number of edges for each graph, an array of shape [n_graph].

Returns:

The relative edge vectors, labelled D by ASE.

mlip.data.helpers.edge_vectors.get_edge_vectors(positions: ndarray, senders: ndarray, receivers: ndarray, shifts: ndarray, cell: ndarray | None, n_edge: ndarray) → Tuple[ndarray, ndarray]¶

Compute positions of sender and receiver nodes of each edge.

With periodic boundary conditions (PBCs), the receiver position will remain unchanged and stay in the unit cell. The sender node’s representative is translated from the unit cell to the nearest neighbouring cell, by subtracting the shift (an integer-valued vector counting lattice steps) multiplied by the 3x3 cell matrix.

# Returns (vectors_senders, vectors_receivers)
vectors_senders   = positions[i] - shifts @ cell
vectors_receivers = positions[j]

The shift vectors therefore describe the number of boundary crossings of the directed edge going from the sender to the receiver.

Parameters:

positions – The positions of the nodes.
senders – The sender nodes of each edge. Output i of ase.neighborlist.primitive_neighbor_list.
receivers – The receiver nodes of each edge. Output j of ase.neighborlist.primitive_neighbor_list.
shifts – The shift vectors of each edge. Output S of ase.neighborlist.primitive_neighbor_list.
cell – The cell of each graph. Array of shape [n_graph, 3, 3].
n_edge – The number of edges of each graph. Array of shape [n_graph].

Returns:

The positions of the sender and receiver nodes of each edge.

mlip.data.helpers.dynamically_batch.dynamically_batch(graphs_tuple_iterator: Iterator[GraphsTuple], n_node: int, n_edge: int, n_graph: int, pad_fn: Callable[[GraphsTuple], GraphsTuple] | None = None, skip_last_batch: bool = False) → Generator[GraphsTuple, None, None]¶

Dynamically batches trees with jraph.GraphsTuples up to specified sizes.

Elements of the graphs_tuple_iterator will be incrementally added to a batch until the limits defined by n_node, n_edge and n_graph are reached. This means each element yielded by this generator may have a differing number of graphs in its batch.

Parameters:

graphs_tuple_iterator – An iterator of jraph.GraphsTuples.
n_node – The maximum number of nodes in a batch, at least the maximum sized graph + 1.
n_edge – The maximum number of edges in a batch, at least the maximum sized graph.
n_graph – The maximum number of graphs in a batch, at least 2.
pad_fn – A function for padding. If None (default), then use jraph.pad_with_graphs.
skip_last_batch – Whether to skip the last batch. The default is false.

Yields:

A jraph.GraphsTuple batch of graphs.

Raises:

ValueError – if the number of graphs is < 2.
RuntimeError – if the graphs_tuple_iterator contains elements which are not jraph.GraphsTuple objects.
RuntimeError – if a graph is found which is larger than the batch size.

mlip.data.helpers.dummy_init_graph.get_dummy_graph_for_model_init() → GraphsTuple¶

Generates a simple dummy graph that can be used for model initialization.

Returns:: The dummy graph.

mlip.data.helpers.atomic_energies.compute_average_e0s_from_graphs(graphs: list[GraphsTuple]) → dict[int, float]¶

Compute average energy contribution of each element by least squares.

Parameters:: graphs – The graphs for which to compute the average energy contribution of each element
Returns:: The atomic energies dictionary which is the mapping of atomic species to the average energy contribution of each element.

mlip.data.helpers.neighbor_analysis.compute_avg_num_neighbors(graphs: list[GraphsTuple]) → float¶

Computes the averages number of neighbors for a given list of graphs.

Parameters:: graphs – The list of graphs to process.
Returns:: The average (i.e., mean) number of neighbors.

mlip.data.helpers.neighbor_analysis.compute_avg_min_neighbor_distance(graphs: list[GraphsTuple]) → float¶

Computes the average minimum neighbor distance for a given list of graphs.

Parameters:: graphs – The list of graphs to process.
Returns:: The average (i.e., mean) minimum neighbor distance.