Graph Dataset¶

class mlip.data.helpers.graph_dataset.GraphDataset(graphs: list[GraphsTuple], batch_size: int, max_n_node: int, max_n_edge: int, min_n_node: int = 1, min_n_edge: int = 1, min_n_graph: int = 1, should_shuffle: bool = True, should_shuffle_between_epochs: bool = True, skip_last_batch: bool = False, raise_exc_if_graphs_discarded: bool = False)¶

Class for holding a dataset consisting of graphs, i.e., jraph.GraphsTuple, and managing batching.

__init__(graphs: list[GraphsTuple], batch_size: int, max_n_node: int, max_n_edge: int, min_n_node: int = 1, min_n_edge: int = 1, min_n_graph: int = 1, should_shuffle: bool = True, should_shuffle_between_epochs: bool = True, skip_last_batch: bool = False, raise_exc_if_graphs_discarded: bool = False)¶

Constructor.

Parameters:

graphs – The graphs to store and manage in this class.
batch_size – The batch size.
max_n_node – The maximum number of nodes contributed by one graph in a batch.
max_n_edge – The maximum number of edges contributed by one graph in a batch.
min_n_node – The minimum number of nodes in a batch, defaults to 1.
min_n_edge – The minimum number of edges in a batch, defaults to 1.
min_n_graph – The minimum number of graphs in a batch, defaults to 1.
should_shuffle – Whether to shuffle the graphs before iterating, defaults to True.
should_shuffle_between_epochs – If true, then reshuffle data between epochs but only if should_shuffle is also true.
skip_last_batch – Whether to skip the last batch. The default is false.
raise_exc_if_graphs_discarded – Whether to raise an exception if there are graphs that must be discarded due to size constraints. Default is False, which means only a warning is logged.

__iter__()¶: Batch over the dataset, according to a batching strategy.

__len__()¶: Returns the number of batches but does not recompute them each time.

subset(i: slice | int | list | float)¶

Constructs and returns a new graph dataset containing a subset of graphs of the current one with given slicing information i.

Parameters:: i – The slicing information. See source code for options.
Returns:: A new graph dataset containing only a subset of the graphs of the current one.