ParquetStorageHandler#

class ParquetStorageHandler(path)[source]#

Storage handler for Parquet files, with ending .parquet.

Loads and saves results in Parquet format, as follows. Each result is stored as a row in a Parquet file with the following columns:

  • model_id: The ID of the model.

  • validation_id: The ID of the validation run.

  • folds: A dictionary of fold results,

    where each key is a fold ID and the value is a dictionary of scores and dataframes.

The results are stored in a tabular format, where each row corresponds to a single model-validation pair.

Columns are the following:

  • model_id: The ID of the model.

  • validation_id: The ID of the validation run.

  • folds.{fold_id}.scores.{score_name}: The score value for the given fold and score name.

  • folds.{fold_id}.ground_truth: The ground truth dataframe for the given fold.

  • folds.{fold_id}.predictions: The predictions dataframe for the given fold.

  • folds.{fold_id}.train_data: The training data dataframe for the given fold.

Columns ground_truth, predictions, and train_data are included only if they were requested during benchmarking.

Parameters:
pathstr, or Path coercible

The path to the file to save to or load

Methods

is_applicable(path)

Check if the storage handler is applicable for the given path.

load()

Load the results from a file.

save(results)

Save the results to a Parquet file.

save(results: list[ResultObject])[source]#

Save the results to a Parquet file.

Parameters:
resultsResultObject

The results to save.

static is_applicable(path)[source]#

Check if the storage handler is applicable for the given path.

Parameters:
pathstr

The path to the file to save to or load

Returns:
bool

True if the storage handler is applicable for the given path.

load() list[ResultObject][source]#

Load the results from a file.

Returns:
list[ResultObject]

The loaded results. Returns empty list if file doesn’t exist.