Evaluator#
- class Evaluator(results)[source]#
Analyze results of machine learning experiments.
- Attributes:
metric_names
Return metric names.
metrics
Return metrics.
metrics_by_strategy
Return metric by strategy.
metrics_by_strategy_dataset
Return metrics by strategy and dataset.
Methods
evaluate
(metric[, train_or_test, cv_fold])Evaluate estimator performance.
fit_runtime
([unit, train_or_test, cv_fold])Calculate the average time for fitting the strategy.
friedman_test
([metric_name])Friedman test.
nemenyi
([metric_name])Nemenyi test.
plot_boxplots
([metric_name])Box plot of metric.
Plot critical difference diagrams.
rank
([metric_name, ascending])Determine estimator ranking.
ranksum_test
([metric_name])Non-parametric test of consistent differences between observation pairs.
sign_test
([metric_name])Non-parametric test for consistent differences between observation pairs.
t_test
([metric_name])T-test on all possible combinations between the estimators.
T-test with correction used to counteract multiple comparisons.
wilcoxon_test
([metric_name])Wilcoxon signed-rank test.
- evaluate(metric, train_or_test='test', cv_fold='all')[source]#
Evaluate estimator performance.
Calculates the average prediction error per estimator as well as the prediction error achieved by each estimator on individual datasets.
- rank(metric_name=None, ascending=False)[source]#
Determine estimator ranking.
Calculates the average ranks based on the performance of each estimator on each dataset
- sign_test(metric_name=None)[source]#
Non-parametric test for consistent differences between observation pairs.
See https://en.wikipedia.org/wiki/Sign_test for details about the test and https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.binom_test.html for details about the scipy implementation.
- ranksum_test(metric_name=None)[source]#
Non-parametric test of consistent differences between observation pairs.
The test counts the number of observations that are greater, smaller and equal to the mean http://en.wikipedia.org/wiki/Wilcoxon_rank-sum_test.
- t_test_with_bonferroni_correction(metric_name=None, alpha=0.05)[source]#
T-test with correction used to counteract multiple comparisons.
- wilcoxon_test(metric_name=None)[source]#
Wilcoxon signed-rank test.
http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed-rank test. Tests whether two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x-y is symmetric about zero
- friedman_test(metric_name=None)[source]#
Friedman test.
The Friedman test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Implementation used: scipy.stats.
- nemenyi(metric_name=None)[source]#
Nemenyi test.
Post-hoc test run if the friedman_test reveals statistical significance. For more information see Nemenyi test.
Implementation used `scikit-posthocs
- fit_runtime(unit='s', train_or_test='test', cv_fold='all')[source]#
Calculate the average time for fitting the strategy.
- Parameters:
- unitstring (must be either ‘s’ for seconds, ‘m’ for minutes or ‘h’ for hours)
the unit in which the run time will be calculated
- Returns:
- run_times: Pandas DataFrame
average run times per estimator and strategy