Scaling¶
- class mlipaudit.benchmarks.scaling.scaling.ScalingBenchmark(force_field: ForceField | Calculator, data_input_dir: str | PathLike = './data', run_mode: RunMode | Literal['dev', 'fast', 'standard'] = RunMode.STANDARD)¶
Benchmark for testing how inference speed scales.
- name¶
The unique benchmark name that should be used to run the benchmark from the CLI and that will determine the output folder name for the result file. The name is
scaling.- Type:
str
- category¶
A string that describes the category of the benchmark, used for example, in the UI app for grouping. Default, if not overridden, is “General”. This benchmark’s category matches the default (“General”).
- Type:
str
- result_class¶
A reference to the type of
BenchmarkResultthat will determine the return type ofself.analyze(). The result class type isScalingResult.- Type:
type[mlipaudit.benchmark.BenchmarkResult] | None
- model_output_class¶
A reference to the
ScalingModelOutputclass.- Type:
type[mlipaudit.benchmark.ModelOutput] | None
- required_elements¶
The set of atomic element types that are present in the benchmark’s input files.
- Type:
set[str] | None
- skip_if_elements_missing¶
Whether the benchmark should be skipped entirely if there are some atomic element types that the model cannot handle. If False, the benchmark must have its own custom logic to handle missing atomic element types. For this benchmark, the attribute is set to True.
- Type:
bool
- __init__(force_field: ForceField | Calculator, data_input_dir: str | PathLike = './data', run_mode: RunMode | Literal['dev', 'fast', 'standard'] = RunMode.STANDARD) None¶
Initializes the benchmark.
- Parameters:
force_field – The force field model to be benchmarked.
data_input_dir – The local input data directory. Defaults to “./data”. If the subdirectory “{data_input_dir}/{benchmark_name}” exists, the benchmark expects the relevant data to be in there, otherwise it will download it from HuggingFace.
run_mode – Whether to run the standard benchmark length, a faster version, or a very fast development version. Subclasses should ensure that when
RunMode.DEV, their benchmark runs in a much shorter timeframe, by running on a reduced number of test cases, for instance. ImplementingRunMode.FASTbeing different fromRunMode.STANDARDis optional and only recommended for very long-running benchmarks. This argument can also be passed as a string “dev”, “fast”, or “standard”.
- Raises:
ChemicalElementsMissingError – If initialization is attempted with a force field that cannot perform inference on the required elements.
ValueError – If force field type is not compatible.
- run_model() None¶
Runs a short MD simulation for each structure, timing each episode and calculating the average episode time, ignoring the first to ignore the compilation time.
- analyze() ScalingResult¶
Aggregate the average episode times and metadata.
- Returns:
A
ScalingResultobject.- Raises:
RuntimeError – If called before
run_model().
- class mlipaudit.benchmarks.scaling.scaling.ScalingResult(*, failed: bool = False, score: Annotated[float | None, Ge(ge=0), Le(le=1)] = None, structure_names: list[str], structures: list[ScalingStructureResult])¶
Result object for the scaling benchmark.
- structure_names¶
The names of the structures.
- Type:
list[str]
- structures¶
List of per structure results.
- class mlipaudit.benchmarks.scaling.scaling.ScalingStructureResult(*, structure_name: str, num_atoms: Annotated[int, Gt(gt=0)], num_steps: Annotated[int, Gt(gt=0)], num_episodes: Annotated[int, Gt(gt=0)], average_episode_time: Annotated[float, Ge(ge=0)] | None = None, average_step_time: Annotated[float, Ge(ge=0)] | None = None, failed: bool = False)¶
Result object for a single structure.
- structure_name¶
The structure name.
- Type:
str
- num_atoms¶
The number of atoms in the structure.
- Type:
int
- num_steps¶
The number of steps in the simulation.
- Type:
int
- num_episodes¶
The number of episodes in the simulation.
- Type:
int
- average_episode_time¶
The average episode time of the simulation, excluding the first episode to ignore the compilation time.
- Type:
float | None
- average_step_time¶
The average step time of the simulation, excluding the first episode to ignore the compilation time.
- Type:
float | None
- failed¶
Whether the simulation failed.
- Type:
bool
- class mlipaudit.benchmarks.scaling.scaling.ScalingModelOutput(*, structure_names: list[str], simulation_states: list[SimulationState | None], average_episode_times: list[float | None])¶
Model output for the scaling benchmark.
- structure_names¶
The names of the structures used.
- Type:
list[str]
- simulation_states¶
A list of final simulation states for each corresponding structure. None if the simulation failed.
- Type:
list[mlip.simulation.state.SimulationState | None]
- average_episode_times¶
A list of average episode times for each corresponding structure, excluding the first episode to ignore the compilation time. None if the simulation failed.
- Type:
list[float | None]