Benchmarking Classifiers#
The benchmarking module in sktime enables easy benchmarking of sktime and sktime compatible classifiers. Benchmarking can be done across a combination of time series classification models and tasks, where, tasks can be further a combination of datasets, splitting strategies and scorers.
This notebook demonstrates a classifier benchmark run.
Imports#
[78]:
from sklearn.metrics import accuracy_score, brier_score_loss
from sklearn.model_selection import KFold
from sktime.benchmarking.classification import ClassificationBenchmark
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.classification.dummy import DummyClassifier
from sktime.datasets import load_unit_test
Instantiate ClassificationBenchmark class#
[79]:
benchmark = ClassificationBenchmark()
Add classifiers which needs to be benchmarked#
[80]:
benchmark.add_estimator(
estimator=DummyClassifier(),
estimator_id="DummyClassifier",
)
benchmark.add_estimator(
estimator=KNeighborsTimeSeriesClassifier(),
estimator_id="KNeighborsTimeSeriesClassifier",
)
Specify cross-validation splitting strategy#
[81]:
n_splits = 3
cv = KFold(n_splits=n_splits)
Specify performance metrics#
[82]:
scorers = [accuracy_score, brier_score_loss]
Specify dataset loaders#
[83]:
dataset_loaders = [load_unit_test]
Add tasks to the ClassificationBenchmarking instance#
[84]:
for dataset_loader in dataset_loaders:
benchmark.add_task(
dataset_loader,
cv,
scorers,
)
Run all classifier-task combinations and save the result#
[85]:
results_df = benchmark.run("./classifier_benchmarking_results.csv")
results_df.T
[85]:
| 0 | 1 | |
|---|---|---|
| validation_id | [dataset=load_unit_test]_[cv_splitter=KFold] | [dataset=load_unit_test]_[cv_splitter=KFold] |
| model_id | DummyClassifier | KNeighborsTimeSeriesClassifier |
| accuracy_score_fold_0_test | 0.285714 | 0.928571 |
| accuracy_score_fold_1_test | 0.571429 | 1.0 |
| accuracy_score_fold_2_test | 0.285714 | 0.857143 |
| accuracy_score_mean | 0.380952 | 0.928571 |
| accuracy_score_std | 0.164957 | 0.071429 |
| brier_score_loss_fold_0_test | 0.326531 | 0.357143 |
| brier_score_loss_fold_1_test | 0.25 | 0.428571 |
| brier_score_loss_fold_2_test | 0.127551 | 0.571429 |
| brier_score_loss_mean | 0.234694 | 0.452381 |
| brier_score_loss_std | 0.100369 | 0.109109 |
| fit_time_fold_0_test | 0.009119 | 0.052076 |
| fit_time_fold_1_test | 0.008259 | 0.006292 |
| fit_time_fold_2_test | 0.049507 | 0.00181 |
| fit_time_mean | 0.022295 | 0.020059 |
| fit_time_std | 0.02357 | 0.027818 |
| pred_time_fold_0_test | 0.002845 | 0.196373 |
| pred_time_fold_1_test | 0.002953 | 0.202413 |
| pred_time_fold_2_test | 0.001597 | 0.195705 |
| pred_time_mean | 0.002465 | 0.198164 |
| pred_time_std | 0.000754 | 0.003695 |
| runtime_secs | 0.024761 | 0.218223 |
[ ]:
Generated using nbsphinx. The Jupyter notebook can be found here.