temporal_train_test_split#
- temporal_train_test_split(y: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray, pandas.core.indexes.base.Index], X: Optional[pandas.core.frame.DataFrame] = None, test_size: Optional[Union[int, float]] = None, train_size: Optional[Union[int, float]] = None, fh: Optional[Union[int, list, numpy.ndarray, pandas.core.indexes.base.Index, sktime.forecasting.base._fh.ForecastingHorizon]] = None) Union[Tuple[pandas.core.series.Series, pandas.core.series.Series], Tuple[pandas.core.series.Series, pandas.core.series.Series, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]] [source]#
Split arrays or matrices into sequential train and test subsets.
Creates train/test splits over endogenous arrays an optional exogenous arrays.
This is a wrapper of scikit-learn’s
train_test_split
that does not shuffle the data.- Parameters
- ytime series in sktime compatible data container format
- Xtime series in sktime compatible data container format, optional, default=None
y and X can be in one of the following formats: Series scitype: pd.Series, pd.DataFrame, or np.ndarray (1D or 2D)
for vanilla forecasting, one time series
- Panel scitype: pd.DataFrame with 2-level row MultiIndex,
3D np.ndarray, list of Series pd.DataFrame, or nested pd.DataFrame for global or panel forecasting
- Hierarchical scitype: pd.DataFrame with 3 or more level row MultiIndex
for hierarchical forecasting
- Number of columns admissible depend on the “scitype:y” tag:
- if self.get_tag(“scitype:y”)==”univariate”:
y must have a single column/variable
- if self.get_tag(“scitype:y”)==”multivariate”:
y must have 2 or more columns
if self.get_tag(“scitype:y”)==”both”: no restrictions on columns apply
- For further details:
on usage, see forecasting tutorial examples/01_forecasting.ipynb on specification of formats, examples/AA_datatypes_and_datasets.ipynb
- test_sizefloat, int or None, optional (default=None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the relative number of test samples. If None, the value is set to the complement of the train size. If
train_size
is also None, it will be set to 0.25.- train_sizefloat, int, or None, (default=None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the relative number of train samples. If None, the value is automatically set to the complement of the test size.
- fhForecastingHorizon
- Returns
- splittingtuple, length=2 * len(arrays)
List containing train-test split of y and X if given.
References
- 1
adapted from https://github.com/alkaline-ml/pmdarima/