evaluate#
- evaluate(forecaster, cv, y, X=None, strategy: str = 'refit', scoring: callable | list[callable] | None = None, return_data: bool = False, error_score: str | int | float = nan, backend: str | None = None, cv_X=None, backend_params: dict | None = None)[source]#
Evaluate forecaster using timeseries cross-validation.
All-in-one statistical performance benchmarking utility for forecasters which runs a simple backtest experiment and returns a summary pd.DataFrame.
The experiment run is the following:
Denote by \(y_{train, 1}, y_{test, 1}, \dots, y_{train, K}, y_{test, K}\) the train/test folds produced by the generator
cv.split_series(y)
. Denote by \(X_{train, 1}, X_{test, 1}, \dots, X_{train, K}, X_{test, K}\) the train/test folds produced by the generatorcv_X.split_series(X)
(ifX
isNone
, consider these to beNone
as well).Initialize the counter to
i = 1
Fit the
forecaster
to \(y_{train, 1}\), \(X_{train, 1}\), withfh
set to the absolute indices of \(y_{test, 1}\).- Use the
forecaster
to make a predictiony_pred
with the exogeneous data \(X_{test, i}\). Predictions are made using either
predict
,predict_proba
orpredict_quantiles
, depending onscoring
.
- Use the
Compute the
scoring
function ony_pred
versus \(y_{test, i}\)If
i == K
, terminate, otherwiseSet
i = i + 1
Ingest more data \(y_{train, i}\), \(X_{train, i}\), how depends on
strategy
:
if
strategy == "refit"
, reset and fitforecaster
viafit
, on \(y_{train, i}\), \(X_{train, i}\) to forecast \(y_{test, i}\)if
strategy == "update"
, updateforecaster
viaupdate
, on \(y_{train, i}\), \(X_{train, i}\) to forecast \(y_{test, i}\)if
strategy == "no-update_params"
, forwardforecaster
viaupdate
, with argumentupdate_params=False
, to the cutoff of \(y_{train, i}\)
Go to 3
Results returned in this function’s return are:
results of
scoring
calculations, from 4, in thei
-th loopruntimes for fitting and/or predicting, from 2, 3, 7, in the
i
-th loopcutoff state of
forecaster
, at 3, in thei
-th loop\(y_{train, i}\), \(y_{test, i}\),
y_pred
(optional)
A distributed and-or parallel back-end can be chosen via the
backend
parameter.- Parameters:
- forecastersktime BaseForecaster descendant (concrete forecaster)
sktime forecaster to benchmark
- cvsktime BaseSplitter descendant
determines split of
y
and possiblyX
into test and train folds y is always split according tocv
, see above ifcv_X
is not passed,X
splits are subset toloc
equal toy
ifcv_X
is passed,X
is split according tocv_X
- ysktime time series container
Target (endogeneous) time series used in the evaluation experiment
- Xsktime time series container, of same mtype as y
Exogenous time series used in the evaluation experiment
- strategy{“refit”, “update”, “no-update_params”}, optional, default=”refit”
defines the ingestion mode when the forecaster sees new data when window expands “refit” = forecaster is refitted to each training window “update” = forecaster is updated with training window data, in sequence provided “no-update_params” = fit to first training window, re-used without fit or update
- scoringsubclass of sktime.performance_metrics.BaseMetric or list of same,
default=None. Used to get a score function that takes y_pred and y_test arguments and accept y_train as keyword argument. If None, then uses scoring = MeanAbsolutePercentageError(symmetric=True).
- return_databool, default=False
Returns three additional columns in the DataFrame, by default False. The cells of the columns contain each a pd.Series for y_train, y_pred, y_test.
- error_score“raise” or numeric, default=np.nan
Value to assign to the score if an exception occurs in estimator fitting. If set to “raise”, the exception is raised. If a numeric value is given, FitFailedWarning is raised.
- backend{“dask”, “loky”, “multiprocessing”, “threading”}, by default None.
Runs parallel evaluate if specified and
strategy
is set as “refit”.“None”: executes loop sequentally, simple list comprehension
“loky”, “multiprocessing” and “threading”: uses
joblib.Parallel
loops“joblib”: custom and 3rd party
joblib
backends, e.g.,spark
“dask”: uses
dask
, requiresdask
package in environment“dask_lazy”: same as “dask”, but changes the return to (lazy)
dask.dataframe.DataFrame
.
Recommendation: Use “dask” or “loky” for parallel evaluate. “threading” is unlikely to see speed ups due to the GIL and the serialization backend (
cloudpickle
) for “dask” and “loky” is generally more robust than the standardpickle
library used in “multiprocessing”.- cv_Xsktime BaseSplitter descendant, optional
determines split of
X
into test and train folds default isX
being split to identicalloc
indices asy
if passed, must have same number of splits ascv
- backend_paramsdict, optional
additional parameters passed to the backend as config. Directly passed to
utils.parallel.parallelize
. Valid keys depend on the value ofbackend
:“None”: no additional parameters,
backend_params
is ignored“loky”, “multiprocessing” and “threading”: default
joblib
backends any valid keys forjoblib.Parallel
can be passed here, e.g.,n_jobs
, with the exception ofbackend
which is directly controlled bybackend
. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults.“joblib”: custom and 3rd party
joblib
backends, e.g.,spark
. any valid keys forjoblib.Parallel
can be passed here, e.g.,n_jobs
,backend
must be passed as a key ofbackend_params
in this case. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults.“dask”: any valid keys for
dask.compute
can be passed, e.g.,scheduler
- Returns:
- resultspd.DataFrame or dask.dataframe.DataFrame
DataFrame that contains several columns with information regarding each refit/update and prediction of the forecaster. Row index is splitter index of train/test fold in
cv
. Entries in the i-th row are for the i-th train/test split incv
. Columns are as follows:test_{scoring.name}: (float) Model performance score. If
scoring
is a
list, then there is a column withname
test_{scoring.name}
for each scorer.fit_time: (float) Time in sec for
fit
orupdate
on train fold.pred_time: (float) Time in sec to
predict
from fitted estimator.len_train_window: (int) Length of train window.
cutoff: (int, pd.Timestamp, pd.Period) cutoff = last time index in train fold.
y_train: (pd.Series) only present if see
return_data=True
train fold of the i-th split in
cv
, used to fit/update the forecaster.y_pred: (pd.Series) present if see
return_data=True
forecasts from fitted forecaster for the i-th test fold indices of
cv
.y_test: (pd.Series) present if see
return_data=True
testing fold of the i-th split in
cv
, used to compute the metric.
Examples
The type of evaluation that is done by
evaluate
depends on metrics in paramscoring
. Default isMeanAbsolutePercentageError
.>>> from sktime.datasets import load_airline >>> from sktime.forecasting.model_evaluation import evaluate >>> from sktime.split import ExpandingWindowSplitter >>> from sktime.forecasting.naive import NaiveForecaster >>> y = load_airline()[:24] >>> forecaster = NaiveForecaster(strategy="mean", sp=3) >>> cv = ExpandingWindowSplitter(initial_window=12, step_length=6, fh=[1, 2, 3]) >>> results = evaluate(forecaster=forecaster, y=y, cv=cv)
Optionally, users may select other metrics that can be supplied by
scoring
argument. These can be forecast metrics of any kind as stated `here<https://www.sktime.net/en/stable/api_reference/performance_metrics.html?highlight=metrics>`_ i.e., point forecast metrics, interval metrics, quantile forecast metrics. To evaluate estimators using a specific metric, provide them to the scoring arg.
>>> from sktime.performance_metrics.forecasting import MeanAbsoluteError >>> loss = MeanAbsoluteError() >>> results = evaluate(forecaster=forecaster, y=y, cv=cv, scoring=loss)
Optionally, users can provide a list of metrics to
scoring
argument.>>> from sktime.performance_metrics.forecasting import MeanSquaredError >>> results = evaluate( ... forecaster=forecaster, ... y=y, ... cv=cv, ... scoring=[MeanSquaredError(square_root=True), MeanAbsoluteError()], ... )
An example of an interval metric is the
PinballLoss
. It can be used with all probabilistic forecasters.>>> from sktime.forecasting.naive import NaiveVariance >>> from sktime.performance_metrics.forecasting.probabilistic import PinballLoss >>> loss = PinballLoss() >>> forecaster = NaiveForecaster(strategy="drift") >>> results = evaluate(forecaster=NaiveVariance(forecaster), ... y=y, cv=cv, scoring=loss)