Combined Graph Dataset

class mlip.data.helpers.combined_graph_dataset.CombinedGraphDataset(graph_datasets: list[GraphDataset], interleaving_method: Literal['regular', 'random'] = 'regular', mesh: Mesh | None = None, seed: int = 0)

A dataset wrapper that combines multiple GraphDataset instances into a single iterable dataset.

This class enables iteration over multiple graph datasets either by randomly mixing them or by interleaving them in a deterministic way.

The deterministic interleaving (regular) approach combines two iterables (iterators_long and iterators_short) into a single generator while preserving an ordering compatible with a multi-host setup. Each group of N consecutive items must be homogeneous, i.e all items in the group must come from the same source iterable. To enforce this, the generator interleaves items as follows: after every R * N items drawn from iterators_long, N items are drawn from iterators_short. Both sequences are therefore consumed in chunks that are divisible by N, ensuring that each N loaded batches are items from only one iterable.

Where: - R is the ratio between iterators_long and iterators_short. - N is the number of devices.

__init__(graph_datasets: list[GraphDataset], interleaving_method: Literal['regular', 'random'] = 'regular', mesh: Mesh | None = None, seed: int = 0)

Class constructor, does not handle prefetching and parallelism

__iter__()

Iterate over the combined dataset according to the interleaving strategy: randomized or deterministic (regular) interleaving.

__len__()

Returns the total number of graphs in both graph datasets.

classmethod init(graph_datasets: list[GraphDataset | PrefetchIterator], interleaving_method: Literal['regular', 'random'] = 'regular', mesh: Mesh | None = None) PrefetchIterator | Self

Initializes an instance of CombinedGraphDataset and automatically handles its conversion into a PrefetchedIterator object in case all items in the graph_datasets list are instances of PrefetchedIterator.

subset()

Constructs a new CombinedGraphDataset object containing a new list of GraphDataset objects each containing a subset of graphs of the current ones.