Combined Graph Dataset¶
- class mlip.data.helpers.combined_graph_dataset.CombinedGraphDataset(graph_datasets: list[GraphDataset], interleaving_method: Literal['regular', 'random'] = 'regular', mesh: Mesh | None = None, seed: int = 0)¶
A dataset wrapper that combines multiple
GraphDatasetinstances into a single iterable dataset.This class enables iteration over multiple graph datasets either by randomly mixing them or by interleaving them in a deterministic way.
The deterministic interleaving (regular) approach combines two iterables (
iterators_longanditerators_short) into a single generator while preserving an ordering compatible with a multi-host setup. Each group ofNconsecutive items must be homogeneous, i.e all items in the group must come from the same source iterable. To enforce this, the generator interleaves items as follows: after everyR * Nitems drawn fromiterators_long,Nitems are drawn fromiterators_short. Both sequences are therefore consumed in chunks that are divisible byN, ensuring that eachNloaded batches are items from only one iterable.Where: -
Ris the ratio betweeniterators_longanditerators_short. -Nis the number of devices.- __init__(graph_datasets: list[GraphDataset], interleaving_method: Literal['regular', 'random'] = 'regular', mesh: Mesh | None = None, seed: int = 0)¶
Class constructor, does not handle prefetching and parallelism
- __iter__()¶
Iterate over the combined dataset according to the interleaving strategy: randomized or deterministic (regular) interleaving.
- __len__()¶
Returns the total number of graphs in both graph datasets.
- classmethod init(graph_datasets: list[GraphDataset | PrefetchIterator], interleaving_method: Literal['regular', 'random'] = 'regular', mesh: Mesh | None = None) PrefetchIterator | Self¶
Initializes an instance of
CombinedGraphDatasetand automatically handles its conversion into aPrefetchedIteratorobject in case all items in thegraph_datasetslist are instances ofPrefetchedIterator.
- subset()¶
Constructs a new
CombinedGraphDatasetobject containing a new list ofGraphDatasetobjects each containing a subset of graphs of the current ones.