I/O of model outputs and benchmark results¶

mlipaudit.io.write_benchmark_result_to_disk(benchmark_name: str, result: BenchmarkResult, output_dir: str | PathLike) → None¶

Writes a benchmark result to disk.

Parameters:

benchmark_name – The benchmark name.
result – The benchmark result.
output_dir – Directory to which to write the result.

mlipaudit.io.load_benchmark_result_from_disk(results_dir: str | PathLike, benchmark_class: type[Benchmark]) → BenchmarkResult¶

Loads a benchmark result from disk.

Parameters:

results_dir – The path to the directory with the results.
benchmark_class – The benchmark class that corresponds to the benchmark to load from disk.

Returns:

The loaded benchmark result.

mlipaudit.io.load_benchmark_results_from_disk(results_dir: str | PathLike, benchmark_classes: list[type[Benchmark]]) → dict[str, dict[str, BenchmarkResult]]¶

Loads benchmark results from disk.

Note that we handle hidden files by ignoring them.

This expects the results to be in our convention of directory structure which is <results_dir>/<model_name>/<benchmark_name>/result.json, i.e., the individual results for each model and their subdirectories containing the individual results for each benchmark in a result.json file.

The results are loaded all together and not only one at a time with this function as this corresponds to the most common use case of the UI app, and the results are not expected to be too large in memory (in contrast, for example, to the model outputs).

Parameters:

results_dir – The path to the directory with the results.
benchmark_classes – A list of benchmark classes that correspond to those benchmarks to load from disk.

Returns:

The loaded results. It is a dictionary of dictionaries. The first key corresponds to the model names and the second keys are the benchmark names.

mlipaudit.io.write_scores_to_disk(scores: dict[str, float | None], output_dir: str | PathLike) → None¶

Writes the scores to disk. This will populate the resulting json with the generated scores, as well as scores of 0.0 for benchmarks that were skipped and scores of None for benchmarks that don’t return scores.

Parameters:

scores – The results as a dictionary with the benchmark names as keys and their scores as values.
output_dir – Directory to which to write the results.

mlipaudit.io.load_score_from_disk(output_dir: str | PathLike) → dict[str, float]¶

Loads the scores from disk for a single model.

Parameters:

output_dir – Directory from which to load the scores. Should point to the folder for the results of a single model.

Returns:

A dictionary of scores where the keys are the: benchmark names.

mlipaudit.io.load_scores_from_disk(scores_dir: str | PathLike) → dict[str, dict[str, float]]¶

Loads the scores from disk for all models.

Parameters:

scores_dir – Directory from which to load the scores. Should point to the folder for the results of multiple models.

Returns:

A dictionary of dictionaries where the first keys: are the model names and the second keys the benchmark names.

mlipaudit.io.write_model_output_to_disk(benchmark_name: str, model_output: ModelOutput, output_dir: str | PathLike) → None¶

Writes a model output to disk.

Each model output is written to disk as a zip archive.

Parameters:

benchmark_name – The benchmark name.
model_output – The model output to save.
output_dir – Directory to which to write the model output.

mlipaudit.io.load_model_output_from_disk(model_outputs_dir: str | PathLike, benchmark_class: type[Benchmark]) → ModelOutput¶

Loads a model output from disk.

Parameters:

model_outputs_dir – The path to the directory with the model_outputs.
benchmark_class – The benchmark class that corresponds to the benchmark to load from disk.

Returns:

The loaded model output.