Graph Dataset¶
- class mlip.data.helpers.graph_dataset.GraphDataset(graphs: list[GraphsTuple], batch_size: int, max_n_node: int, max_n_edge: int, min_n_node: int = 1, min_n_edge: int = 1, min_n_graph: int = 1, should_shuffle: bool = True, should_shuffle_between_epochs: bool = True, skip_last_batch: bool = False, raise_exc_if_graphs_discarded: bool = False)¶
Class for holding a dataset consisting of graphs, i.e.,
jraph.GraphsTuple
, and managing batching.- __init__(graphs: list[GraphsTuple], batch_size: int, max_n_node: int, max_n_edge: int, min_n_node: int = 1, min_n_edge: int = 1, min_n_graph: int = 1, should_shuffle: bool = True, should_shuffle_between_epochs: bool = True, skip_last_batch: bool = False, raise_exc_if_graphs_discarded: bool = False)¶
Constructor.
- Parameters:
graphs – The graphs to store and manage in this class.
batch_size – The batch size.
max_n_node – The maximum number of nodes contributed by one graph in a batch.
max_n_edge – The maximum number of edges contributed by one graph in a batch.
min_n_node – The minimum number of nodes in a batch, defaults to 1.
min_n_edge – The minimum number of edges in a batch, defaults to 1.
min_n_graph – The minimum number of graphs in a batch, defaults to 1.
should_shuffle – Whether to shuffle the graphs before iterating, defaults to True.
should_shuffle_between_epochs – If true, then reshuffle data between epochs but only if should_shuffle is also true.
skip_last_batch – Whether to skip the last batch. The default is false.
raise_exc_if_graphs_discarded – Whether to raise an exception if there are graphs that must be discarded due to size constraints. Default is False, which means only a warning is logged.
- __iter__()¶
Batch over the dataset, according to a batching strategy.
- __len__()¶
Returns the number of batches but does not recompute them each time.
- subset(i: slice | int | list | float)¶
Constructs and returns a new graph dataset containing a subset of graphs of the current one with given slicing information
i
.- Parameters:
i – The slicing information. See source code for options.
- Returns:
A new graph dataset containing only a subset of the graphs of the current one.