Evaluator#

class Evaluator(results)[source]#

Analyze results of machine learning experiments.

Attributes:
metric_names

Return metric names.

metrics

Return metrics.

metrics_by_strategy

Return metric by strategy.

metrics_by_strategy_dataset

Return metrics by strategy and dataset.

Methods

evaluate(metric[, train_or_test, cv_fold])

Evaluate estimator performance.

fit_runtime([unit, train_or_test, cv_fold])

Calculate the average time for fitting the strategy.

friedman_test([metric_name])

Friedman test.

nemenyi([metric_name])

Nemenyi test.

plot_boxplots([metric_name])

Box plot of metric.

plot_critical_difference_diagram([...])

Plot critical difference diagrams.

rank([metric_name, ascending])

Determine estimator ranking.

ranksum_test([metric_name])

Non-parametric test of consistent differences between observation pairs.

sign_test([metric_name])

Non-parametric test for consistent differences between observation pairs.

t_test([metric_name])

T-test on all possible combinations between the estimators.

t_test_with_bonferroni_correction([...])

T-test with correction used to counteract multiple comparisons.

wilcoxon_test([metric_name])

Wilcoxon signed-rank test.

property metric_names[source]#

Return metric names.

property metrics[source]#

Return metrics.

property metrics_by_strategy[source]#

Return metric by strategy.

property metrics_by_strategy_dataset[source]#

Return metrics by strategy and dataset.

evaluate(metric, train_or_test='test', cv_fold='all')[source]#

Evaluate estimator performance.

Calculates the average prediction error per estimator as well as the prediction error achieved by each estimator on individual datasets.

plot_boxplots(metric_name=None, **kwargs)[source]#

Box plot of metric.

rank(metric_name=None, ascending=False)[source]#

Determine estimator ranking.

Calculates the average ranks based on the performance of each estimator on each dataset

t_test(metric_name=None)[source]#

T-test on all possible combinations between the estimators.

sign_test(metric_name=None)[source]#

Non-parametric test for consistent differences between observation pairs.

See https://en.wikipedia.org/wiki/Sign_test for details about the test and https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.binom_test.html for details about the scipy implementation.

ranksum_test(metric_name=None)[source]#

Non-parametric test of consistent differences between observation pairs.

The test counts the number of observations that are greater, smaller and equal to the mean http://en.wikipedia.org/wiki/Wilcoxon_rank-sum_test.

t_test_with_bonferroni_correction(metric_name=None, alpha=0.05)[source]#

T-test with correction used to counteract multiple comparisons.

https://en.wikipedia.org/wiki/Bonferroni_correction

wilcoxon_test(metric_name=None)[source]#

Wilcoxon signed-rank test.

http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed-rank test. Tests whether two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x-y is symmetric about zero

friedman_test(metric_name=None)[source]#

Friedman test.

The Friedman test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Implementation used: scipy.stats.

nemenyi(metric_name=None)[source]#

Nemenyi test.

Post-hoc test run if the friedman_test reveals statistical significance. For more information see Nemenyi test.

Implementation used `scikit-posthocs

<maximtrp/scikit-posthocs>`_.

fit_runtime(unit='s', train_or_test='test', cv_fold='all')[source]#

Calculate the average time for fitting the strategy.

Parameters:
unitstring (must be either ‘s’ for seconds, ‘m’ for minutes or ‘h’ for hours)

the unit in which the run time will be calculated

Returns:
run_times: Pandas DataFrame

average run times per estimator and strategy

plot_critical_difference_diagram(metric_name=None, alpha=0.1)[source]#

Plot critical difference diagrams.

References

original implementation by Aaron Bostrom, modified by Markus Löning.