temporal_train_test_split#

temporal_train_test_split(y: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray, pandas.core.indexes.base.Index], X: Optional[pandas.core.frame.DataFrame] = None, test_size: Optional[Union[int, float]] = None, train_size: Optional[Union[int, float]] = None, fh: Optional[Union[int, list, numpy.ndarray, pandas.core.indexes.base.Index, sktime.forecasting.base._fh.ForecastingHorizon]] = None) → Union[Tuple[pandas.core.series.Series, pandas.core.series.Series], Tuple[pandas.core.series.Series, pandas.core.series.Series, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]][source]#

Split arrays or matrices into sequential train and test subsets.

Creates train/test splits over endogenous arrays an optional exogenous arrays.

This is a wrapper of scikit-learn’s train_test_split that does not shuffle the data.

Parameters

ytime series in sktime compatible data container format

Xtime series in sktime compatible data container format, optional, default=None

y and X can be in one of the following formats: Series scitype: pd.Series, pd.DataFrame, or np.ndarray (1D or 2D)

for vanilla forecasting, one time series

Panel scitype: pd.DataFrame with 2-level row MultiIndex,

3D np.ndarray, list of Series pd.DataFrame, or nested pd.DataFrame for global or panel forecasting

Hierarchical scitype: pd.DataFrame with 3 or more level row MultiIndex

for hierarchical forecasting

Number of columns admissible depend on the “scitype:y” tag:

if self.get_tag(“scitype:y”)==”univariate”:: y must have a single column/variable
if self.get_tag(“scitype:y”)==”multivariate”:: y must have 2 or more columns

if self.get_tag(“scitype:y”)==”both”: no restrictions on columns apply

For further details:

on usage, see forecasting tutorial examples/01_forecasting.ipynb on specification of formats, examples/AA_datatypes_and_datasets.ipynb

test_sizefloat, int or None, optional (default=None)

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the relative number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

train_sizefloat, int, or None, (default=None)

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the relative number of train samples. If None, the value is automatically set to the complement of the test size.

fhForecastingHorizon

Returns

splittingtuple, length=2 * len(arrays): List containing train-test split of y and X if given.

References

1: adapted from https://github.com/alkaline-ml/pmdarima/