The following information is designed to get users up and running with
sktime quickly. For more detailed information, see the links in each of the subsections.
sktime currently supports:
environments with python version 3.8, 3.9, 3.10, 3.11, or 3.12.
operating systems Mac OS X, Unix-like OS, Windows 8.1 and higher
Please see the installation guide for step-by-step instructions on the package installation.
sktime seeks to provide a unified framework for multiple time series machine learning tasks. This (hopefully) makes
sktime's functionality intuitive for users
and lets developers extend the framework more easily. But time series data and the related scientific use cases each can take multiple forms.
Therefore, a key set of common concepts and terminology is important.
sktime is designed for time series machine learning. Time series data refers to data where the variables are ordered over time or
an index indicating the position of an observation in the sequence of values.
sktime time series data can refer to data that is univariate, multivariate or panel, with the difference relating to the number and interrelation
between time series variables, as well as the number of instances for which each variable is observed.
Multivariate time series data refers to data where multiple variables are tracked over time for the same instance. For example, multiple quarterly economic indicators for a country or multiple sensor readings from the same machine.
Panel time series data refers to data where the variables (univariate or multivariate) are tracked for multiple instances. For example, multiple quarterly economic indicators for several countries or multiple sensor readings for multiple machines.
sktime's functionality for each learning tasks is centered around providing a set of code artifacts that match a common interface to a given
scientific purpose (i.e. scientific type or scitype). For example,
sktime includes a common interface for “forecaster” classes designed to predict future values
of a time series.
sktime's interface currently supports:
Time series classification where the time series data for a given instance are used to predict a categorical target class.
Time series regression where the time series data for a given instance are used to predict a continuous target value.
Time series clustering where the goal is to discover groups consisting of instances with similar time series.
Forecasting where the goal is to predict future values of the input series.
Time series annotation which is focused on outlier detection, anomaly detection, change point detection and segmentation.
While the list above presents each learning task separately, in many cases it is possible to adapt one learning task to help solve another related learning task. For example,
one approach to forecasting would be to use a regression model that explicitly accounts for the data’s time dimension. However, another approach is to reduce the forecasting problem
to cross-sectional regression, where the input data are tabularized and lags of the data are treated as independent features in scikit-learn style
tabular regression algorithms. Likewise one approach to the time series annotation task like anomaly detection is to reduce the problem to using forecaster to predict future values and flag
observations that are too far from these predictions as anomalies.
sktime typically incorporates these type of reductions through the use of composable classes that
let users adapt one learning task to solve another related one.
The code snippets below are designed to introduce
sktime's functionality so you can start using its functionality quickly. For more detailed information see the Tutorials, User Guide and API Reference in
>>> from sktime.datasets import load_airline >>> from sktime.forecasting.base import ForecastingHorizon >>> from sktime.forecasting.model_selection import temporal_train_test_split >>> from sktime.forecasting.theta import ThetaForecaster >>> from sktime.performance_metrics.forecasting import mean_absolute_percentage_error >>> y = load_airline() >>> y_train, y_test = temporal_train_test_split(y) >>> fh = ForecastingHorizon(y_test.index, is_relative=False) >>> forecaster = ThetaForecaster(sp=12) # monthly seasonal periodicity >>> forecaster.fit(y_train) >>> y_pred = forecaster.predict(fh) >>> mean_absolute_percentage_error(y_test, y_pred) 0.08661467738190656
Time Series Classification#
>>> from sktime.classification.interval_based import TimeSeriesForestClassifier >>> from sktime.datasets import load_arrow_head >>> from sklearn.model_selection import train_test_split >>> from sklearn.metrics import accuracy_score >>> X, y = load_arrow_head() >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> classifier = TimeSeriesForestClassifier() >>> classifier.fit(X_train, y_train) >>> y_pred = classifier.predict(X_test) >>> accuracy_score(y_test, y_pred) 0.8679245283018868
Time Series Regression#
>>> from sktime.datasets import load_covid_3month >>> from sktime.regression.distance_based import KNeighborsTimeSeriesRegressor >>> from sklearn.metrics import mean_squared_error >>> X_train, y_train = load_covid_3month(split="train") >>> y_train = y_train.astype("float") >>> X_test, _ = load_covid_3month(split="test") >>> regressor = KNeighborsTimeSeriesRegressor() >>> regressor.fit(X_train, y_train) >>> y_pred = regressor.predict(X_test) >>> mean_squared_error(y_test, y_pred)
Time Series Clustering#
>>> from sklearn.model_selection import train_test_split >>> from sktime.clustering.k_means import TimeSeriesKMeans >>> from sktime.clustering.utils.plotting._plot_partitions import plot_cluster_algorithm >>> from sktime.datasets import load_arrow_head >>> X, y = load_arrow_head() >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> k_means = TimeSeriesKMeans(n_clusters=5, init_algorithm="forgy", metric="dtw") >>> k_means.fit(X_train) >>> plot_cluster_algorithm(k_means, X_test, k_means.n_clusters)
Time Series Annotation#
The time series annotation API is experimental, and may change in future releases.
>>> from sktime.annotation.adapters import PyODAnnotator >>> from pyod.models.iforest import IForest >>> from sktime.datasets import load_airline >>> y = load_airline() >>> pyod_model = IForest() >>> pyod_sktime_annotator = PyODAnnotator(pyod_model) >>> pyod_sktime_annotator.fit(y) >>> annotated_series = pyod_sktime_annotator.predict(y)