binder

Probabilistic Forecasting with sktime#

originally presented at pydata Berlin 2022, see there for video presentation

Overview of this notebook#

  • quick start - probabilistic forecasting

  • disambiguation - types of probabilistic forecasts

  • details: probabilistic forecasting interfaces

  • metrics for, and evaluation of probabilistic forecasts

  • advanced composition: pipelines, tuning, reduction

  • extender guide

  • contributor credits

[1]:
import warnings

warnings.filterwarnings("ignore")

Quick Start - Probabilistic Forecasting with sktime#

… works exactly like the basic forecasting workflow, replace predict by a probabilistic method!

[2]:
from sktime.datasets import load_airline
from sktime.forecasting.arima import ARIMA

# step 1: data specification
y = load_airline()
# step 2: specifying forecasting horizon
fh = [1, 2, 3]
# step 3: specifying the forecasting algorithm
forecaster = ARIMA()
# step 4: fitting the forecaster
forecaster.fit(y, fh=[1, 2, 3])
# step 5: querying predictions
y_pred = forecaster.predict()

# for probabilistic forecasting:
#   call a probabilistic forecasting method after or instead of step 5
y_pred_int = forecaster.predict_interval(coverage=0.9)
y_pred_int
[2]:
Number of airline passengers
0.9
lower upper
1961-01 371.535093 481.554608
1961-02 344.853206 497.712761
1961-03 324.223996 508.191104

probabilistic forecasting methods in ``sktime``:

  • forecast intervals - predict_interval(fh=None, X=None, coverage=0.90)

  • forecast quantiles - predict_quantiles(fh=None, X=None, alpha=[0.05, 0.95])

  • forecast variance - predict_var(fh=None, X=None, cov=False)

  • distribution forecast - predict_proba(fh=None, X=None, marginal=True)

To check which forecasters in sktime support probabilistic forecasting, use the registry.all_estimators utility and search for estimators which have the capability:pred_int tag (value True).

For composites such as pipelines, a positive tag means that logic is implemented if (some or all) components support it.

[3]:
from sktime.registry import all_estimators

all_estimators(
    "forecaster", filter_tags={"capability:pred_int": True}, as_dataframe=True
)
[3]:
name object
0 ARIMA <class 'sktime.forecasting.arima.ARIMA'>
1 AutoARIMA <class 'sktime.forecasting.arima.AutoARIMA'>
2 AutoETS <class 'sktime.forecasting.ets.AutoETS'>
3 BATS <class 'sktime.forecasting.bats.BATS'>
4 BaggingForecaster <class 'sktime.forecasting.compose._bagging.Ba...
5 ColumnEnsembleForecaster <class 'sktime.forecasting.compose._column_ens...
6 ConformalIntervals <class 'sktime.forecasting.conformal.Conformal...
7 DynamicFactor <class 'sktime.forecasting.dynamic_factor.Dyna...
8 FhPlexForecaster <class 'sktime.forecasting.compose._fhplex.FhP...
9 ForecastX <class 'sktime.forecasting.compose._pipeline.F...
10 ForecastingGridSearchCV <class 'sktime.forecasting.model_selection._tu...
11 ForecastingPipeline <class 'sktime.forecasting.compose._pipeline.F...
12 ForecastingRandomizedSearchCV <class 'sktime.forecasting.model_selection._tu...
13 ForecastingSkoptSearchCV <class 'sktime.forecasting.model_selection._tu...
14 NaiveForecaster <class 'sktime.forecasting.naive.NaiveForecast...
15 NaiveVariance <class 'sktime.forecasting.naive.NaiveVariance'>
16 Permute <class 'sktime.forecasting.compose._pipeline.P...
17 Prophet <class 'sktime.forecasting.fbprophet.Prophet'>
18 SARIMAX <class 'sktime.forecasting.sarimax.SARIMAX'>
19 SquaringResiduals <class 'sktime.forecasting.squaring_residuals....
20 StatsForecastARCH <class 'sktime.forecasting.arch.StatsForecastA...
21 StatsForecastAutoARIMA <class 'sktime.forecasting.statsforecast.Stats...
22 StatsForecastAutoCES <class 'sktime.forecasting.statsforecast.Stats...
23 StatsForecastAutoETS <class 'sktime.forecasting.statsforecast.Stats...
24 StatsForecastAutoTheta <class 'sktime.forecasting.statsforecast.Stats...
25 StatsForecastGARCH <class 'sktime.forecasting.arch.StatsForecastG...
26 StatsForecastMSTL <class 'sktime.forecasting.statsforecast.Stats...
27 TBATS <class 'sktime.forecasting.tbats.TBATS'>
28 ThetaForecaster <class 'sktime.forecasting.theta.ThetaForecast...
29 TransformedTargetForecaster <class 'sktime.forecasting.compose._pipeline.T...
30 UnobservedComponents <class 'sktime.forecasting.structural.Unobserv...
31 VAR <class 'sktime.forecasting.var.VAR'>
32 VECM <class 'sktime.forecasting.vecm.VECM'>

What is probabilistic forecasting?#

Intuition#

  • produce low/high scenarios of forecasts

  • quantify uncertainty around forecasts

  • produce expected range of variation of forecasts

Interface view#

Want to produce “distribution” or “range” of forecast values,

at time stamps defined by forecasting horizon fh

given past data y (series), and possibly exogeneous data X

Input, to fit or predict: fh, y, X

Output, from predict_probabilistic: some “distribution” or “range” object

Big caveat: there are multiple possible ways to model “distribution” or “range”!

Used in practice and easily confused! (and often, practically, confused!)

Formal view (endogeneous, one forecast time stamp)#

Let \(y(t_1), \dots, y(t_n)\) be observations at fixed time stamps \(t_1, \dots, t_n\).

(we consider \(y\) as an \(\mathbb{R}^n\)-valued random variable)

Let \(y'\) be a (true) value, which will be observed at a future time stamp \(\tau\).

(we consider \(y'\) as an \(\mathbb{R}\)-valued random variable)

We have the following “types of forecasts” of \(y'\):

Name

param

prediction/estimate of

sktime

point forecast

conditional expectation \(\mathbb{E}[y'\|y]\)

predict

variance forecast

conditional variance \(Var[y'\|y]\)

predict_var

quantile forecast

\(\alpha\in (0,1)\)

\(\alpha\)-quantile of \(y'\|y\)

predict_quantiles

interval forecast

\(c\in (0,1)\)

\([a,b]\) s.t. \(P(a\le y' \le b\| y) = c\)

predict_interval

distribution forecast

the law/distribution of \(y'\|y\)

predict_proba

Notes:

  • different forecasters have different capabilities!

  • metrics, evaluation & tuning are different by “type of forecast”

  • compositors can “add” type of forecast! Example: bootstrap

More formal details & intuition:#

  • a “point forecast” is a prediction/estimate of the conditional expectation \(\mathbb{E}[y'|y]\). Intuition: “out of many repetitions/worlds, this value is the arithmetic average of all observations”.

  • a “variance forecast” is a prediction/estimate of the conditional expectation \(Var[y'|y]\). Intuition: “out of many repetitions/worlds, this value is the average squared distance of the observation to the perfect point forecast”.

  • a “quantile forecast”, at quantile point \(\alpha\in (0,1)\) is a prediction/estimate of the \(\alpha\)-quantile of \(y'|y\), i.e., of \(F^{-1}_{y'|y}(\alpha)\), where \(F^{-1}\) is the (generalized) inverse cdf = quantile function of the random variable y’|y. Intuition: “out of many repetitions/worlds, a fraction of exactly \(\alpha\) will have equal or smaller than this value.”

  • an “interval forecast” or “predictive interval” with (symmetric) coverage \(c\in (0,1)\) is a prediction/estimate pair of lower bound \(a\) and upper bound \(b\) such that \(P(a\le y' \le b| y) = c\) and \(P(y' \gneq b| y) = P(y' \lneq a| y) = (1 - c) /2\). Intuition: “out of many repetitions/worlds, a fraction of exactly \(c\) will be contained in the interval \([a,b]\), and being above is equally likely as being below”.

  • a “distribution forecast” or “full probabilistic forecast” is a prediction/estimate of the distribution of \(y'|y\), e.g., “it’s a normal distribution with mean 42 and variance 1”. Intuition: exhaustive description of the generating mechanism of many repetitions/worlds.

Notes:

  • lower/upper of interval forecasts are quantile forecasts at quantile points \(0.5 - c/2\) and \(0.5 + c/2\) (as long as forecast distributions are absolutely continuous).

  • all other forecasts can be obtained from a full probabilistic forecasts; a full probabilistic forecast can be obtained from all quantile forecasts or all interval forecasts.

  • there is no exact relation between the other types of forecasts (point or variance vs quantile)

  • in particular, point forecast does not need to be median forecast aka 0.5-quantile forecast. Can be \(\alpha\)-quantile for any \(\alpha\)!

Frequent confusion in literature & python packages: * coverage c vs quantile \alpha * coverage c vs significance p = 1-c * quantile of lower interval bound, p/2, vs p * interval forecasts vs related, but substantially different concepts: confidence interval on predictive mean; Bayesian posterior or credibility interval of the predictive mean * all forecasts above can be Bayesian, confusion: “posteriors are different” or “have to be evaluted differently”


Probabilistic forecasting interfaces in sktime#

This section:

  • walkthrough of probabilistic predict methods

  • use in update/predict workflow

  • multivariate and hierarchical data

All forecasters with tag capability:pred_int provide the following:

  • forecast intervals - predict_interval(fh=None, X=None, coverage=0.90)

  • forecast quantiles - predict_quantiles(fh=None, X=None, alpha=[0.05, 0.95])

  • forecast variance - predict_var(fh=None, X=None, cov=False)

  • distribution forecast - predict_proba(fh=None, X=None, marginal=True)

Generalities:

  • methods do not change state, multiple can be called

  • fh is optional, if passed late

  • exogeneous data X can be passed

[4]:
import numpy as np

from sktime.datasets import load_airline
from sktime.forecasting.theta import ThetaForecaster

# until fit, identical with the simple workflow
y = load_airline()

fh = np.arange(1, 13)

forecaster = ThetaForecaster(sp=12)
forecaster.fit(y, fh=fh)
[4]:
ThetaForecaster(sp=12)
Please rerun this cell to show the HTML repr or trust the notebook.
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
coverage, float or list of floats, default=0.9
nominal coverage(s) of the prediction interval(s) queried
Output: pandas.DataFrame
Row index is fh
Column has multi-index:
1st level = variable name from y in fit
2nd level = coverage fractions in coverage
3rd level = string “lower” or “upper”

Entries = forecasts of lower/upper interval at nominal coverage in 2nd lvl, for var in 1st lvl, for time in row

[5]:
coverage = 0.9
y_pred_ints = forecaster.predict_interval(coverage=coverage)
y_pred_ints
[5]:
Number of airline passengers
0.9
lower upper
1961-01 418.280121 464.281951
1961-02 402.215881 456.888055
1961-03 459.966113 522.110500
1961-04 442.589309 511.399214
1961-05 443.525027 518.409480
1961-06 506.585814 587.087737
1961-07 561.496768 647.248956
1961-08 557.363322 648.062363
1961-09 477.658056 573.047752
1961-10 407.915090 507.775355
1961-11 346.942924 451.082016
1961-12 394.708221 502.957142

pretty-plotting the predictive interval forecasts:

[6]:
from sktime.utils import plotting

# also requires predictions
y_pred = forecaster.predict()

fig, ax = plotting.plot_series(
    y, y_pred, labels=["y", "y_pred"], pred_interval=y_pred_ints
)

ax.legend();
../_images/examples_01b_forecasting_proba_21_0.png

multiple coverages:

[7]:
coverage = [0.5, 0.9, 0.95]
y_pred_ints = forecaster.predict_interval(coverage=coverage)
y_pred_ints
[7]:
Number of airline passengers
0.50 0.90 0.95
lower upper lower upper lower upper
1961-01 431.849266 450.712806 418.280121 464.281951 413.873755 468.688317
1961-02 418.342514 440.761421 402.215881 456.888055 396.979011 462.124925
1961-03 478.296822 503.779790 459.966113 522.110500 454.013504 528.063109
1961-04 462.886144 491.102379 442.589309 511.399214 435.998232 517.990291
1961-05 465.613670 496.320837 443.525027 518.409480 436.352089 525.582418
1961-06 530.331440 563.342111 506.585814 587.087737 498.874797 594.798754
1961-07 586.791063 621.954661 561.496768 647.248956 553.282845 655.462879
1961-08 584.116789 621.308897 557.363322 648.062363 548.675556 656.750129
1961-09 505.795123 544.910684 477.658056 573.047752 468.520987 582.184821
1961-10 437.370840 478.319605 407.915090 507.775355 398.349800 517.340645
1961-11 377.660798 420.364142 346.942924 451.082016 336.967779 461.057161
1961-12 426.638370 471.026993 394.708221 502.957142 384.339409 513.325954
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
alpha, float or list of floats, default = [0.1, 0.9]
quantile points at which quantiles are queried
Output: pandas.DataFrame
Row index is fh
Column has multi-index:
1st level = variable name from y in fit
2nd level = quantile points in alpha

Entries = forecasts of quantiles at quantile point in 2nd lvl, for var in 1st lvl, for time in row

[8]:
alpha = [0.1, 0.25, 0.5, 0.75, 0.9]
y_pred_quantiles = forecaster.predict_quantiles(alpha=alpha)
y_pred_quantiles
[8]:
Number of airline passengers
0.10 0.25 0.50 0.75 0.90
1961-01 423.360378 431.849266 441.281036 450.712806 459.201694
1961-02 408.253656 418.342514 429.551968 440.761421 450.850279
1961-03 466.829089 478.296822 491.038306 503.779790 515.247523
1961-04 450.188398 462.886144 476.994261 491.102379 503.800124
1961-05 451.794965 465.613670 480.967253 496.320837 510.139542
1961-06 515.476124 530.331440 546.836776 563.342111 578.197428
1961-07 570.966896 586.791063 604.372862 621.954661 637.778829
1961-08 567.379760 584.116789 602.712843 621.308897 638.045925
1961-09 488.192511 505.795123 525.352904 544.910684 562.513297
1961-10 418.943257 437.370840 457.845222 478.319605 496.747188
1961-11 358.443627 377.660798 399.012470 420.364142 439.581313
1961-12 406.662797 426.638370 448.832681 471.026993 491.002565

pretty-plotting the quantile interval forecasts:

[9]:
from sktime.utils import plotting

columns = [y_pred_quantiles[i] for i in y_pred_quantiles.columns]
fig, ax = plotting.plot_series(y[-50:], *columns)
../_images/examples_01b_forecasting_proba_28_0.png
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
cov, boolean, default=False
whether covariance forecasts should also be returned (not all estimators support this)
Output: pandas.DataFrame, for cov=False:
Row index is fh
Column is equal to column index of y (variables)

Entries = variance forecast for variable in col, for time in row

[10]:
y_pred_variance = forecaster.predict_var()
y_pred_variance
[10]:
Number of airline passengers
1961-01 195.540049
1961-02 276.196509
1961-03 356.852968
1961-04 437.509428
1961-05 518.165887
1961-06 598.822347
1961-07 679.478807
1961-08 760.135266
1961-09 840.791726
1961-10 921.448185
1961-11 1002.104645
1961-12 1082.761105

with covariance, using a forecaster which can return covariance forecasts:

return is pandas.DataFrame with fh indexing rows and columns;
entries are forecast covariance between row and column time
(diagonal = forecast variances)
[11]:
from sktime.forecasting.naive import NaiveVariance

forecaster_with_covariance = NaiveVariance(forecaster)
forecaster_with_covariance.fit(y=y, fh=fh)
forecaster_with_covariance.predict_var(cov=True)
[11]:
1961-01 1961-02 1961-03 1961-04 1961-05 1961-06 1961-07 1961-08 1961-09 1961-10 1961-11 1961-12
1961-01 292.337338 255.743002 264.805454 227.703065 146.093860 154.452835 157.976801 105.160769 78.330257 81.835803 78.048880 197.364513
1961-02 255.743002 422.704619 402.539279 353.437066 291.205423 236.587887 227.199386 205.653016 152.067422 121.629136 156.199113 245.437913
1961-03 264.805454 402.539279 588.085358 506.095484 426.997535 394.503941 311.457854 282.072157 243.688600 185.938841 185.070365 305.461220
1961-04 227.703065 353.437066 506.095484 634.350469 526.180900 482.653111 422.777319 323.453753 280.749314 242.065791 211.397170 294.971041
1961-05 146.093860 291.205423 426.997535 526.180900 628.659359 570.277532 499.460195 419.166450 325.582777 281.608607 269.847443 318.534683
1961-06 154.452835 236.587887 394.503941 482.653111 570.277532 728.132505 629.184846 527.767036 444.690514 330.643653 313.248427 382.803221
1961-07 157.976801 227.199386 311.457854 422.777319 499.460195 629.184846 753.550007 629.138725 536.407564 441.998603 352.570968 415.110922
1961-08 105.160769 205.653016 282.072157 323.453753 419.166450 527.767036 629.138725 729.423302 615.142484 506.155610 439.994837 430.992295
1961-09 78.330257 152.067422 243.688600 280.749314 325.582777 444.690514 536.407564 615.142484 744.225555 609.227136 527.489574 546.637590
1961-10 81.835803 121.629136 185.938841 242.065791 281.608607 330.643653 441.998603 506.155610 609.227136 697.805477 590.542045 604.681136
1961-11 78.048880 156.199113 185.070365 211.397170 269.847443 313.248427 352.570968 439.994837 527.489574 590.542045 706.960631 698.982589
1961-12 197.364513 245.437913 305.461220 294.971041 318.534683 382.803221 415.110922 430.992295 546.637590 604.681136 698.982589 913.698243
Inputs:
fh - forecasting horizon (not necessary if seen in fit)
marginal, bool, optional, default=True
whether returned distribution is marginal over time points (True), or joint over time points (False)
(not all forecasters support marginal=False)
Output: tensorflow-probability Distribution object (requires tensorflow installed)
if marginal=True: batch shape 1D, len(fh) (time); event shape 1D, len(y.columns) (variables)
if marginal=False: event shape 2D, [len(fh), len(y.columns)]
[12]:
y_pred_dist = forecaster.predict_proba()
y_pred_dist
[12]:
Normal(columns=Index(['Number of airline passengers'], dtype='object'),
       index=PeriodIndex(['1961-01', '1961-02', '1961-03', '1961-04', '1961-05', '1961-06',
             '1961-07', '1961-08', '1961-09', '1961-10', '1961-11', '1961-12'],
            dtype='period[M]'),
       mu=         Number of airline passengers
1961-01                    441.281036
1961-02                    429.551968
1961-03                    491.038306
1961-04                    476.994261
1961-05                    480.967253
1961-06                    546.836776
1961-07                    604.372862
1961-08                    602.712843
1961-09                    525.352904
1961-10                    457.845222
1961-11                    399.012470
1961-12                    448.832681,
       sigma=         Number of airline passengers
1961-01                     13.983564
1961-02                     16.619161
1961-03                     18.890552
1961-04                     20.916726
1961-05                     22.763257
1961-06                     24.470847
1961-07                     26.066814
1961-08                     27.570551
1961-09                     28.996409
1961-10                     30.355365
1961-11                     31.656036
1961-12                     32.905336)
Please rerun this cell to show the HTML repr or trust the notebook.
[13]:
# obtaining quantiles
y_pred_dist.quantile([0.1, 0.9])
[13]:
Number of airline passengers
0.1 0.9
1961-01 423.360378 459.201694
1961-02 408.253656 450.850279
1961-03 466.829089 515.247523
1961-04 450.188398 503.800124
1961-05 451.794965 510.139542
1961-06 515.476124 578.197428
1961-07 570.966896 637.778829
1961-08 567.379760 638.045925
1961-09 488.192511 562.513297
1961-10 418.943257 496.747188
1961-11 358.443627 439.581313
1961-12 406.662797 491.002565

Outputs of predict_interval, predict_quantiles, predict_var, predict_proba are typically but not necessarily consistent with each other!

Consistency is weak interface requirement but not strictly enforced.

Using probabilistic forecasts with update/predict stream workflow#

Example: * data observed monthly * make probabilistic forecasts for an entire year ahead * update forecasts every month * start in Dec 1950

[14]:
# 1949 and 1950
y_start = y[:24]
# Jan 1951 etc
y_update_batch_1 = y.loc[["1951-01"]]
y_update_batch_2 = y.loc[["1951-02"]]
y_update_batch_3 = y.loc[["1951-03"]]
[15]:
# now = Dec 1950

# 1a. fit to data available in Dec 1950
#   fh = [1, 2, ..., 12] for all 12 months ahead
forecaster.fit(y_start, fh=1 + np.arange(12))

# 1b. predict 1951, in Dec 1950
forecaster.predict_interval()
# or other proba predict functions
[15]:
Number of airline passengers
0.9
lower upper
1951-01 125.707998 141.744251
1951-02 135.554586 154.422381
1951-03 149.921348 171.247998
1951-04 140.807417 164.337362
1951-05 127.941097 153.484993
1951-06 152.968277 180.378548
1951-07 167.193935 196.351356
1951-08 166.316512 197.122153
1951-09 150.425516 182.795561
1951-10 128.623033 162.485285
1951-11 109.567283 144.858705
1951-12 125.641292 162.306217
[16]:
# time passes, now = Jan 1951

# 2a. update forecaster with new data
forecaster.update(y_update_batch_1)

# 2b. make new prediction - year ahead = Feb 1951 to Jan 1952
forecaster.predict_interval()
# forecaster remembers relative forecasting horizon
[16]:
Number of airline passengers
0.9
lower upper
1951-02 136.659398 152.695651
1951-03 150.894540 169.762336
1951-04 141.748826 163.075476
1951-05 128.876521 152.406466
1951-06 153.906406 179.450302
1951-07 168.170069 195.580339
1951-08 167.339648 196.497069
1951-09 151.478088 182.283729
1951-10 129.681615 162.051660
1951-11 110.621201 144.483453
1951-12 126.786551 162.077973
1952-01 121.345120 158.010045

repeat the same commands with further data batches:

[17]:
# time passes, now = Feb 1951

# 3a. update forecaster with new data
forecaster.update(y_update_batch_2)

# 3b. make new prediction - year ahead = Feb 1951 to Jan 1952
forecaster.predict_interval()
[17]:
Number of airline passengers
0.9
lower upper
1951-03 151.754366 167.790619
1951-04 142.481687 161.349482
1951-05 129.549186 150.875836
1951-06 154.439360 177.969305
1951-07 168.623239 194.167135
1951-08 167.770039 195.180310
1951-09 151.929281 181.086702
1951-10 130.167033 160.972675
1951-11 111.133102 143.503147
1951-12 127.264390 161.126643
1952-01 121.830227 157.121649
1952-02 132.976436 169.641361
[18]:
# time passes, now = Feb 1951

# 4a. update forecaster with new data
forecaster.update(y_update_batch_3)

# 4b. make new prediction - year ahead = Feb 1951 to Jan 1952
forecaster.predict_interval()
[18]:
Number of airline passengers
0.9
lower upper
1951-04 143.421741 159.457994
1951-05 130.401488 149.269284
1951-06 155.166803 176.493453
1951-07 169.300650 192.830595
1951-08 168.451755 193.995651
1951-09 152.643333 180.053604
1951-10 130.913435 160.070856
1951-11 111.900919 142.706560
1951-12 128.054402 160.424448
1952-01 122.645052 156.507304
1952-02 133.834107 169.125529
1952-03 149.605277 186.270202

… and so on.

[19]:
from sktime.split import ExpandingWindowSplitter
from sktime.utils.plotting import plot_windows

cv = ExpandingWindowSplitter(step_length=1, fh=fh, initial_window=24)
plot_windows(cv, y.iloc[:50])
../_images/examples_01b_forecasting_proba_48_0.png

Probabilistic forecasting for multivariate and hierarchical data#

multivariate data: first column index for different variables

[20]:
from sktime.datasets import load_longley
from sktime.forecasting.var import VAR

_, y = load_longley()

mv_forecaster = VAR()

mv_forecaster.fit(y, fh=[1, 2, 3])
# mv_forecaster.predict_var()
[20]:
VAR()
Please rerun this cell to show the HTML repr or trust the notebook.

hierarchical data: probabilistic forecasts per level are row-concatenated with a row hierarchy index

[21]:
from sktime.forecasting.arima import ARIMA
from sktime.utils._testing.hierarchical import _make_hierarchical

y_hier = _make_hierarchical()
y_hier
[21]:
c0
h0 h1 time
h0_0 h1_0 2000-01-01 1.000000
2000-01-02 3.518414
2000-01-03 3.439018
2000-01-04 4.930397
2000-01-05 4.612988
... ... ... ...
h0_1 h1_3 2000-01-08 3.877603
2000-01-09 2.867442
2000-01-10 5.020762
2000-01-11 4.566137
2000-01-12 3.806208

96 rows × 1 columns

[22]:
forecaster = ARIMA()
forecaster.fit(y_hier, fh=[1, 2, 3])
forecaster.predict_interval()
[22]:
0
0.9
lower upper
h0 h1 time
h0_0 h1_0 2000-01-13 1.547240 5.656626
2000-01-14 1.486515 6.111832
2000-01-15 1.524294 6.277836
h1_1 2000-01-13 3.285397 5.304444
2000-01-14 3.357664 5.387562
2000-01-15 3.349540 5.379555
h1_2 2000-01-13 2.916079 6.618686
2000-01-14 2.920492 6.653291
2000-01-15 2.922742 6.656034
h1_3 2000-01-13 2.830648 5.936913
2000-01-14 2.753652 5.974169
2000-01-15 2.744015 5.972928
h0_1 h1_0 2000-01-13 2.371798 4.871334
2000-01-14 2.447080 5.252941
2000-01-15 2.526418 5.406628
h1_1 2000-01-13 2.648028 5.302283
2000-01-14 2.673720 5.360916
2000-01-15 2.679975 5.367988
h1_2 2000-01-13 3.541852 5.942392
2000-01-14 3.423455 5.869728
2000-01-15 3.441320 5.889336
h1_3 2000-01-13 3.150096 5.231645
2000-01-14 2.981736 5.161234
2000-01-15 3.014187 5.202889

(more about this in the hierarchical forecasting notebook)


Metrics for probabilistic forecasts and evaluation#

overview - theory#

Predicted y has different form from true y, so metrics have form

metric(y_true: series, y_pred: proba_prediction) -> float

where proba_prediction is the type of the specific “probabilistic prediction type”.

I.e., we have the following function signature for a loss/metric \(L\):

Name

param

prediction/estimate of

general form

point forecast

conditional expectation \(\mathbb{E}[y'\|y]\)

metric(y_true, y_pred)

variance forecast

conditional variance \(Var[y'\|y]\)

metric(y_pred, y_pt, y_var) (requires point forecast too)

quantile forecast

\(\alpha\in (0,1)\)

\(\alpha\)-quantile of \(y'\|y\)

metric(y_true, y_quantiles, alpha)

interval forecast

\(c\in (0,1)\)

\([a,b]\) s.t. \(P(a\le y' \le b\| y) = c\)

metric(y_true, y_interval, c)

distribution forecast

the law/distribution of \(y'\|y\)

metric(y_true, y_distribution)

metrics: general signature and averaging#

intro using the example of the quantile loss aka interval loss aka pinball loss, in the univariate case.

For one quantile value \(\alpha\), the (per-sample) pinball loss function is defined as
\(L_{\alpha}(\widehat{y}, y) := \alpha \cdot \Theta (y - \widehat{y}) + (1-\alpha) \cdot \Theta (\widehat{y} - y)\),
where \(\Theta (x) := [1\) if \(x\ge 0\) and \(0\) otherwise \(]\) is the Heaviside function.

This can be used to evaluate:

  • multiple quantile forecasts \(\widehat{\bf y}:=\widehat{y}_1, \dots, \widehat{y}_k\) for quantiles \(\bm{\alpha} = \alpha_1,\dots, \alpha_k\) via \(L_{\bm{\alpha}}(\widehat{\bf y}, y) := \frac{1}{k}\sum_{i=1}^k L_{\alpha_i}(\widehat{y}_i, y)\)

  • interval forecasts \([\widehat{a}, \widehat{b}]\) at symmetric coverage \(c\) via \(L_c([\widehat{a},\widehat{b}], y) := \frac{1}{2} L_{\alpha_{low}}(\widehat{a}, y) + \frac{1}{2}L_{\alpha_{high}}(\widehat{b}, y)\) where \(\alpha_{low} = \frac{1-c}{2}, \alpha_{high} = \frac{1+c}{2}\)

(all are known to be strictly proper losses for their respective prediction object)

There are three things we can choose to average over:

  • quantile values, if multiple are predicted - elements of alpha in predict_interval(fh, alpha)

  • time stamps in the forecasting horizon fh - elements of fh in fit(fh) resp predict_interval(fh, alpha)

  • variables in y, in case of multivariate (later, first we look at univariate)

We will show quantile values and time stamps first:

  1. averaging by fh time stamps only -> one number per quantile value in alpha

  2. averaging over nothing -> one number per quantile value in alpha and fh time stamp

  3. averaging over both fh and quantile values in alpha -> one number

first, generating some quantile predictions. pred_quantiles now contains quantile forecasts
formally, forecasts \(\widehat{y}_j(t_i)\) where \(\widehat{y_j}\) are forecasts at quantile \(\alpha_j\), with range \(i=1\dots N, j=1\dots k\)
\(\alpha_j\) are the elements of alpha, and \(t_i\) are the future time stamps indexed by fh
[23]:
import numpy as np

from sktime.datasets import load_airline
from sktime.forecasting.theta import ThetaForecaster

y_train = load_airline()[0:24]  # train on 24 months, 1949 and 1950
y_test = load_airline()[24:36]  # ground truth for 12 months in 1951

# try to forecast 12 months ahead, from y_train
fh = np.arange(1, 13)

forecaster = ThetaForecaster(sp=12)
forecaster.fit(y_train, fh=fh)

pred_quantiles = forecaster.predict_quantiles(alpha=[0.1, 0.25, 0.5, 0.75, 0.9])
pred_quantiles
[23]:
Number of airline passengers
0.10 0.25 0.50 0.75 0.90
1951-01 127.478978 130.438207 133.726125 137.014043 139.973272
1951-02 137.638269 141.120013 144.988483 148.856953 152.338697
1951-03 152.276577 156.212063 160.584673 164.957283 168.892768
1951-04 143.405969 147.748037 152.572389 157.396741 161.738809
1951-05 130.762062 135.475772 140.713045 145.950318 150.664027
1951-06 155.995358 161.053476 166.673412 172.293349 177.351467
1951-07 170.413963 175.794490 181.772645 187.750801 193.131328
1951-08 169.718563 175.403241 181.719332 188.035423 193.720101
1951-09 154.000334 159.973698 166.610539 173.247380 179.220743
1951-10 132.362644 138.611370 145.554159 152.496947 158.745674
1951-11 113.464726 119.977182 127.212994 134.448805 140.961262
1951-12 129.690419 136.456333 143.973754 151.491176 158.257090
  1. computing the loss by quantile point or interval end, averaged over fh time stamps i.e., \(\frac{1}{N} \sum_{i=1}^N L_{\alpha}(\widehat{y}(t_i), y(t_i))\) for \(t_i\) in the fh, and every alpha, this is one number per quantile value in alpha

[24]:
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss

loss = PinballLoss(score_average=False)
loss(y_true=y_test, y_pred=pred_quantiles)
[24]:
0.10    2.706601
0.25    5.494502
0.50    8.162435
0.75    8.003799
0.90    5.220249
Name: 0, dtype: float64
  1. computing the the individual loss values, by sample, no averaging, i.e., \(L_{\alpha}(\widehat{y}(t_i), y(t_i))\) for every \(t_i\) in fh and every \(\alpha\) in alpha this is one number per quantile value \(\alpha\) in alpha and time point \(t_i\) in fh

[25]:
loss.evaluate_by_index(y_true=y_test, y_pred=pred_quantiles)
[25]:
0.10 0.25 0.50 0.75 0.90
0 1.752102 3.640448 5.636938 5.989468 4.524056
1 1.236173 2.219997 2.505758 0.857285 0.233870
2 2.572342 5.446984 8.707664 9.782038 8.196509
3 1.959403 3.812991 5.213805 4.202444 1.135072
4 4.123794 9.131057 15.643478 19.537261 19.202375
5 2.200464 4.236631 5.663294 4.279988 0.583679
6 2.858604 5.801378 8.613677 8.436899 5.281805
7 2.928144 5.899190 8.640334 8.223433 4.751909
8 2.999967 6.006576 8.694731 8.064465 4.301331
9 2.963736 5.847157 8.222921 7.127290 2.928894
10 3.253527 6.505704 9.393503 8.663396 4.534864
11 3.630958 7.385917 11.013123 10.881618 6.968619
  1. computing the loss for a multiple quantile forecast, averaged over fh time stamps and quantile values alpha i.e., \(\frac{1}{Nk} \sum_{j=1}^k\sum_{i=1}^N L_{\alpha_j}(\widehat{y_j}(t_i), y(t_i))\) for \(t_i\) in fh, and quantile values \(\alpha_j\), this is a single number that can be used in tuning (e.g., grid search) or evaluation overall

[26]:
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss

loss_multi = PinballLoss(score_average=True)
loss_multi(y_true=y_test, y_pred=pred_quantiles)
[26]:
5.917517291566615
  1. computing the loss for a multiple quantile forecast, averaged quantile values alpha, for individual time stamps i.e., \(\frac{1}{k} \sum_{j=1}^k L_{\alpha_j}(\widehat{y_j}(t_i), y(t_i))\) for \(t_i\) in fh, and quantile values \(\alpha_j\), this is a univariate time series at fh times \(t_i\), it can be used for tuning or evaluation by horizon index

[27]:
loss_multi.evaluate_by_index(y_true=y_test, y_pred=pred_quantiles)
[27]:
0      4.308602
1      1.410617
2      6.941107
3      3.264743
4     13.527593
5      3.392811
6      6.198473
7      6.088602
8      6.013414
9      5.417999
10     6.470199
11     7.976047
dtype: float64

Question: why is score_average a constructor flag, and evaluate_by_index a method?

  • not all losses are “by index”, so evaluate_by_index logic can vary (e.g., pseudo-samples)

  • constructor args define “mathematical object” of scientific signature: series -> non-temporal object methods define action or “way to apply”, e.g., as used in tuning or reporting

Compare score_average to multioutput arg in scikit-learn metrics and sktime.

metrics: interval vs quantile metrics#

Interval and quantile metrics can be used interchangeably:

internally, these are easily convertible to each other
recall: lower/upper interval = quantiles at \(\frac{1}{2} \pm \frac{1}2\) coverage
[28]:
pred_interval = forecaster.predict_interval(coverage=0.8)
pred_interval
[28]:
Number of airline passengers
0.8
lower upper
1951-01 127.478978 139.973272
1951-02 137.638269 152.338697
1951-03 152.276577 168.892768
1951-04 143.405969 161.738809
1951-05 130.762062 150.664027
1951-06 155.995358 177.351467
1951-07 170.413963 193.131328
1951-08 169.718563 193.720101
1951-09 154.000334 179.220743
1951-10 132.362644 158.745674
1951-11 113.464726 140.961262
1951-12 129.690419 158.257090

loss object recognizes input type automatically and computes corresponding interval loss

[29]:
loss(y_true=y_test, y_pred=pred_interval)
[29]:
0.1    2.706601
0.9    5.220249
Name: 0, dtype: float64
[30]:
loss_multi(y_true=y_test, y_pred=pred_interval)
[30]:
3.9634248763342055

evaluation by backtesting#

[31]:
from sktime.datasets import load_airline
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss
from sktime.split import ExpandingWindowSplitter

# 1. define data
y = load_airline()

# 2. define splitting/backtesting regime
fh = [1, 2, 3]
cv = ExpandingWindowSplitter(step_length=12, fh=fh, initial_window=72)

# 3. define loss to use
loss = PinballLoss()
# default is score_average=True and multi_output="uniform_average", so gives a number

forecaster = ThetaForecaster(sp=12)
results = evaluate(
    forecaster=forecaster, y=y, cv=cv, strategy="refit", return_data=True, scoring=loss
)
results.iloc[:, :5].head()
[31]:
test_PinballLoss fit_time pred_time len_train_window cutoff
0 0.865788 0.013478 0.013374 72 1954-12
1 0.958340 0.011405 0.012645 84 1955-12
2 0.981744 0.012250 0.016469 96 1956-12
3 1.411309 0.011964 0.011275 108 1957-12
4 1.187198 0.012172 0.011219 120 1958-12
  • each row is one train/test split in the walkforward setting

  • first col is the loss on the test fold

  • last two columns summarize length of training window, cutoff between train/test

roadmap items:

implementing further metrics

  • distribution prediction metrics - may need tfp extension

  • advanced evaluation set-ups

  • variance loss

contributions are appreciated!


Advanced composition: pipelines, tuning, reduction, adding proba forecasts to any estimator#

composition = constructing “composite” estimators out of multiple “component” estimators

  • reduction = building estimator type A using estimator type B

    • special case: adding proba forecasting capability to non-proba forecaster

    • special case: using proba supervised learner for proba forecasting

  • pipelining = chaining estimators, here: transformers to a forecaster

  • tuning = automated hyper-parameter fitting, usually via internal evaluation loop

    • special case: grid parameter search and random parameter search tuning

    • special case: “Auto-ML”, optimizing not just estimator hyper-parameter but also choice of estimator

Adding probabilistic forecasts to non-probabilistic forecasters#

start with a forecaster that does not produce probabilistic predictions:

[32]:
from sktime.forecasting.exp_smoothing import ExponentialSmoothing

my_forecaster = ExponentialSmoothing()

# does the forecaster support probabilistic predictions?
my_forecaster.get_tag("capability:pred_int")
[32]:
False

adding probabilistic predictions is possible via reduction wrappers:

[33]:
# NaiveVariance adds intervals & variance via collecting past residuals
from sktime.forecasting.naive import NaiveVariance

# create a composite forecaster like this:
my_forecaster_with_proba = NaiveVariance(my_forecaster)

# does it support probabilistic predictions now?
my_forecaster_with_proba.get_tag("capability:pred_int")
[33]:
True

the composite can now be used like any probabilistic forecaster:

[34]:
y = load_airline()

my_forecaster_with_proba.fit(y, fh=[1, 2, 3])
my_forecaster_with_proba.predict_interval()
[34]:
Number of airline passengers
0.9
lower upper
1961-01 341.950660 522.049339
1961-02 319.828426 544.171573
1961-03 307.334366 556.665632

roadmap items:

more compositors to enable probabilistic forecasting

  • bootstrap forecast intervals

  • reduction to probabilistic supervised learning

  • popular “add probabilistic capability” wrappers

contributions are appreciated!

Tuning and AutoML#

tuning and autoML with probabilistic forecasters works exactly like with “ordinary” forecasters
via ForecastingGridSearchCV or ForecastingRandomSearchCV
  • change metric to tune to a probabilistic metric

  • use a corresponding probabilistic metric or loss function

Internally, evaluation will be done using probabilistic metric, via backtesting evaluation.

important: to evaluate the tuned estimator, use it in evaluate or a separate benchmarking workflow.
Using internal metric/loss values amounts to in-sample evaluation, which is over-optimistic.
[35]:
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss
from sktime.split import SlidingWindowSplitter

# forecaster we want to tune
forecaster = ThetaForecaster()

# parameter grid to search over
param_grid = {"sp": [1, 6, 12]}

# evaluation/backtesting regime for *tuning*
fh = [1, 2, 3]  # fh for tuning regime, does not need to be same as in fit/predict!
cv = SlidingWindowSplitter(window_length=36, fh=fh)
scoring = PinballLoss()

# construct the composite forecaster with grid search compositor
gscv = ForecastingGridSearchCV(
    forecaster, cv=cv, param_grid=param_grid, scoring=scoring, strategy="refit"
)
[36]:
from sktime.datasets import load_airline

y = load_airline()[:60]

gscv.fit(y, fh=fh)
[36]:
ForecastingGridSearchCV(cv=SlidingWindowSplitter(fh=[1, 2, 3],
                                                 window_length=36),
                        forecaster=ThetaForecaster(),
                        param_grid={'sp': [1, 6, 12]}, scoring=PinballLoss())
Please rerun this cell to show the HTML repr or trust the notebook.

inspect hyper-parameter fit obtained by tuning:

[37]:
gscv.best_params_
[37]:
{'sp': 12}

obtain predictions:

[38]:
gscv.predict_interval()
[38]:
Number of airline passengers
0.9
lower upper
1954-01 190.832917 217.164705
1954-02 195.638436 226.620354
1954-03 221.947953 256.967883

for AutoML, use the MultiplexForecaster to select among multiple forecasters:

[39]:
from sktime.forecasting.compose import MultiplexForecaster
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.naive import NaiveForecaster, NaiveVariance

forecaster = MultiplexForecaster(
    forecasters=[
        ("naive", NaiveForecaster(strategy="last")),
        ("ets", ExponentialSmoothing(trend="add", sp=12)),
    ],
)

forecaster_param_grid = {"selected_forecaster": ["ets", "naive"]}
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=forecaster_param_grid)

gscv.fit(y)
gscv.best_params_
[39]:
{'selected_forecaster': 'naive'}

Pipelines with probabilistic forecasters#

sktime pipelines are compatible with probabilistic forecasters:

  • ForecastingPipeline applies transformers to the exogeneous X argument before passing them to the forecaster

  • TransformedTargetForecaster transforms y and back-transforms forecasts, including interval or quantile forecasts

ForecastingPipeline takes a chain of transformers and forecasters, say,

[t1, t2, ..., tn, f],

where t[i] are forecasters that pre-process, and tp[i] are forecasters that postprocess

fit(y, X, fh) does:#

X1 = t1.fit_transform(X)
X2 = t2.fit_transform(X1)
etc
X[n] = t3.fit_transform(X[n-1])

f.fit(y=y, x=X[n])

predict_[sth](X, fh) does:#

X1 = t1.transform(X)
X2 = t2.transform(X1)
etc
X[n] = t3.transform(X[n-1])

f.predict_[sth](X=X[n], fh)

[40]:
from sktime.datasets import load_macroeconomic
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.compose import ForecastingPipeline
from sktime.split import temporal_train_test_split
from sktime.transformations.series.impute import Imputer
[41]:
data = load_macroeconomic()
y = data["unemp"]
X = data.drop(columns=["unemp"])

y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
[42]:
forecaster = ForecastingPipeline(
    steps=[
        ("imputer", Imputer(method="mean")),
        ("forecaster", ARIMA(suppress_warnings=True)),
    ]
)
forecaster.fit(y=y_train, X=X_train, fh=X_test.index[:5])
forecaster.predict_interval(X=X_test[:5])
[42]:
0
0.9
lower upper
Period
1997Q1 5.042704 6.119990
1997Q2 3.948564 5.235163
1997Q3 3.887471 5.253592
1997Q4 4.108211 5.506862
1998Q1 4.501319 5.913611

TransformedTargetForecaster takes a chain of transformers and forecasters, say,

[t1, t2, ..., tn, f, tp1, tp2, ..., tk],

where t[i] are forecasters that pre-process, and tp[i] are forecasters that postprocess

fit(y, X, fh) does:#

y1 = t1.fit_transform(y)
y2 = t2.fit_transform(y1)
y3 = t3.fit_transform(y2)
etc
y[n] = t3.fit_transform(y[n-1])

f.fit(y[n])

yp1 = tp1.fit_transform(yn)
yp2 = tp2.fit_transform(yp1)
yp3 = tp3.fit_transform(yp2)
etc

predict_quantiles(y, X, fh) does:#

y1 = t1.transform(y)
y2 = t2.transform(y1)
etc
y[n] = t3.transform(y[n-1])

y_pred = f.predict_quantiles(y[n])

y_pred = t[n].inverse_transform(y_pred)
y_pred = t[n-1].inverse_transform(y_pred)
etc
y_pred = t1.inverse_transform(y_pred)
y_pred = tp1.transform(y_pred)
y_pred = tp2.transform(y_pred)
etc
y_pred = tp[n].transform(y_pred)

Note: the remaining proba predictions are inferred from predict_quantiles.

[43]:
from sktime.datasets import load_macroeconomic
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.detrend import Deseasonalizer, Detrender
[44]:
data = load_macroeconomic()
y = data[["unemp"]]
[45]:
forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", Deseasonalizer(sp=12)),
        ("detrend", Detrender()),
        ("forecast", ARIMA()),
    ]
)

forecaster.fit(y, fh=[1, 2, 3])
forecaster.predict_interval()
[45]:
0
0.9
lower upper
2009Q4 8.949103 10.068284
2010Q1 8.639806 10.206350
2010Q2 8.438112 10.337207
[46]:
forecaster.predict_quantiles()
[46]:
0
0.05 0.95
2009Q4 8.949103 10.068284
2010Q1 8.639806 10.206350
2010Q2 8.438112 10.337207

quick creation also possible via the * dunder method, same pipeline:

[47]:
forecaster = Deseasonalizer(sp=12) * Detrender() * ARIMA()
[48]:
forecaster.fit(y, fh=[1, 2, 3])
forecaster.predict_interval()
[48]:
0
0.9
lower upper
2009Q4 8.949103 10.068284
2010Q1 8.639806 10.206350
2010Q2 8.438112 10.337207

Building your own probabilistic forecaster#

Getting started:

Extension template = python “fill-in” template with to-do blocks that allow you to implement your own, sktime-compatible forecasting algorithm.

Check estimators using check_estimator

For probabilistic forecasting:

  • implement at least one of predict_quantiles, predict_interval, predict_var, predict_proba

  • optimally, implement all, unless identical with defaulting behaviour as below

  • if only one is implemented, others use following defaults (in this sequence, dependent availability):

    • predict_interval uses quantiles from predict_quantiles and vice versa

    • predict_var uses variance from predict_proba, or variance of normal with IQR as obtained from predict_quantiles

    • predict_interval or predict_quantiles uses quantiles from predict_proba distribution

    • predict_proba returns normal with mean predict and variance predict_var

  • so if predictive residuals not normal, implement predict_proba or predict_quantiles

  • if interfacing, implement the ones where least “conversion” is necessary

  • ensure to set the capability:pred_int tag to True

[49]:
# estimator checking on the fly using check_estimator

# suppose NaiveForecaster is your new estimator
from sktime.forecasting.naive import NaiveForecaster

# check the estimator like this

# uncomment this block to run

# from sktime.utils.estimator_checks import check_estimator
#
# check_estimator(NaiveForecaster)

# this prints any failed tests, and returns dictionary with
#   keys of test runs and results from the test run
# run individual tests using the tests_to_run arg or the fixtures_to_run_arg
#   these need to be identical to test or test/fixture names, see docstring
[50]:
# to raise errors for use in traceback debugging:

# uncomment next line to run
# check_estimator(NaiveForecaster, raise_exceptions=True)

# this does not raise an error since NaiveForecaster is fine, but would if it weren't

Summary#
  • unified API for probabilistic forecasting and probabilistic metrics

  • integrating other packages (e.g. scikit-learn, statsmodels, pmdarima, prophet)

  • interface for composite model building is same, proba or not (pipelining, ensembling, tuning, reduction)

  • easily extensible with custom estimators

Useful resources#

Credits#

notebook creation: fkiraly

probablistic forecasting framework: fkiraly, kejsitake
probabilistic metrics, tuning: eenticott-shell, fkiraly
probabilistic estimators: aiwalter, fkiraly, ilyasmoutawwakil, k1m190r, kejsitake

Generated using nbsphinx. The Jupyter notebook can be found here.