binder

Feature extraction with tsfresh transformer#

In this tutorial, we show how you can use sktime with tsfresh to first extract features from time series, so that we can then use any scikit-learn estimator.

Preliminaries#

You have to install tsfresh if you haven’t already. To install it, uncomment the cell below:

[1]:
# !pip install --upgrade tsfresh
[2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

Univariate time series classification data#

For more details on the data set, see the univariate time series classification notebook.

[3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
[4]:
X_train.head()
[4]:
dim_0
69 0 -1.7998 1 -1.7987 2 -1.7942 3 ...
103 0 -1.8091 1 -1.8067 2 -1.7866 3 ...
34 0 -2.0417 1 -2.0572 2 -2.0522 3 ...
14 0 -2.1888 1 -2.1855 2 -2.1765 3 ...
121 0 -1.9586 1 -1.9371 2 -1.8798 3 ...
[5]:
#  binary classification task
np.unique(y_train)
[5]:
array(['0', '1', '2'], dtype=object)

Using tsfresh to extract features#

[6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.05s/it]
[6]:
dim_0__variance_larger_than_standard_deviation dim_0__has_duplicate_max dim_0__has_duplicate_min dim_0__has_duplicate dim_0__sum_values dim_0__abs_energy dim_0__mean_abs_change dim_0__mean_change dim_0__mean_second_derivative_central dim_0__median ... dim_0__fourier_entropy__bins_2 dim_0__fourier_entropy__bins_3 dim_0__fourier_entropy__bins_5 dim_0__fourier_entropy__bins_10 dim_0__fourier_entropy__bins_100 dim_0__permutation_entropy__dimension_3__tau_1 dim_0__permutation_entropy__dimension_4__tau_1 dim_0__permutation_entropy__dimension_5__tau_1 dim_0__permutation_entropy__dimension_6__tau_1 dim_0__permutation_entropy__dimension_7__tau_1
0 0.0 0.0 0.0 1.0 -0.000080 249.998516 0.052357 -0.000001 -0.000005 -0.024066 ... 0.046288 0.092513 0.092513 0.092513 0.250609 1.323194 1.819631 2.183824 2.463220 2.707387
1 0.0 0.0 1.0 1.0 -0.000525 250.000756 0.049118 0.000000 -0.000006 -0.031622 ... 0.046288 0.046288 0.092513 0.092513 0.184769 1.213529 1.668744 2.081159 2.418614 2.707518
2 0.0 0.0 0.0 1.0 -0.000034 249.998998 0.069971 0.000084 0.000025 0.018880 ... 0.081510 0.092513 0.092513 0.138673 0.311663 1.116706 1.545256 1.889777 2.155644 2.374722
3 0.0 0.0 0.0 1.0 0.000202 249.999702 0.067601 -0.000002 -0.000010 0.384770 ... 0.046288 0.092513 0.092513 0.204643 0.414263 1.323315 1.915330 2.406197 2.794719 3.117007
4 0.0 0.0 0.0 1.0 -0.000146 249.998674 0.050355 -0.000004 -0.000046 -0.045353 ... 0.046288 0.092513 0.092513 0.092513 0.230801 1.173933 1.628543 2.003443 2.303091 2.559695

5 rows × 773 columns

Using tsfresh with sktime#

[7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:11<00:00,  2.21s/it]
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.45it/s]
[7]:
0.8490566037735849

Multivariate time series classification data#

[8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
[9]:
#  multivariate input data
X_train.head()
[9]:
dim_0 dim_1 dim_2 dim_3 dim_4 dim_5
20 0 -0.294498 1 -0.294498 2 -0.050044 3... 0 0.540218 1 0.540218 2 -0.515245 3... 0 0.218114 1 0.218114 2 -0.301108 3... 0 -0.045277 1 -0.045277 2 0.103872 3... 0 -0.002663 1 -0.002663 2 -0.183773 3... 0 0.031960 1 0.031960 2 0.037287 3...
26 0 -0.761604 1 -0.761604 2 0.121078 3... 0 0.260125 1 0.260125 2 -1.423255 3... 0 -0.064487 1 -0.064487 2 0.075600 3... 0 0.069248 1 0.069248 2 -0.282318 3... 0 0.242367 1 0.242367 2 -0.332922 3... 0 -0.007990 1 -0.007990 2 0.239704 3...
7 0 -0.352746 1 -0.352746 2 -1.354561 3... 0 0.316845 1 0.316845 2 0.490525 3... 0 -0.473779 1 -0.473779 2 1.454261 3... 0 -0.327595 1 -0.327595 2 -0.269001 3... 0 0.106535 1 0.106535 2 0.021307 3... 0 0.197090 1 0.197090 2 0.460763 3...
8 0 -0.342233 1 -0.342233 2 -0.298542 3... 0 0.327415 1 0.327415 2 -0.527154 3... 0 0.157229 1 0.157229 2 0.248585 3... 0 0.394179 1 0.394179 2 -0.037287 3... 0 0.074574 1 0.074574 2 -0.087891 3... 0 -0.037287 1 -0.037287 2 -0.050604 3...
10 0 0.206148 1 0.206148 2 6.53436... 0 -0.658294 1 -0.658294 2 4.597327 3... 0 0.469612 1 0.469612 2 -2.723661 3... 0 -0.106535 1 -0.106535 2 -0.439456 3... 0 0.306288 1 0.306288 2 1.717875 3... 0 0.950824 1 0.950824 2 -1.041379 3...
[10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:18<00:00,  3.69s/it]
[10]:
dim_0__variance_larger_than_standard_deviation dim_0__has_duplicate_max dim_0__has_duplicate_min dim_0__has_duplicate dim_0__sum_values dim_0__abs_energy dim_0__mean_abs_change dim_0__mean_change dim_0__mean_second_derivative_central dim_0__median ... dim_5__fourier_entropy__bins_2 dim_5__fourier_entropy__bins_3 dim_5__fourier_entropy__bins_5 dim_5__fourier_entropy__bins_10 dim_5__fourier_entropy__bins_100 dim_5__permutation_entropy__dimension_3__tau_1 dim_5__permutation_entropy__dimension_4__tau_1 dim_5__permutation_entropy__dimension_5__tau_1 dim_5__permutation_entropy__dimension_6__tau_1 dim_5__permutation_entropy__dimension_7__tau_1
0 0.0 0.0 0.0 1.0 33.334188 110.735119 0.822452 0.000639 0.001751 0.164096 ... 0.165443 0.165443 0.165443 0.192626 0.545824 1.279774 1.910772 2.565051 3.096812 3.567632
1 1.0 0.0 0.0 1.0 73.888480 220.949429 0.964075 -0.002087 -0.003908 0.613719 ... 0.096509 0.096509 0.261160 0.261160 0.451359 1.313299 1.987599 2.593635 3.173890 3.696247
2 0.0 0.0 0.0 1.0 -17.428760 7.940863 0.170422 0.002326 -0.000244 -0.152038 ... 0.223718 0.261160 0.356468 0.545824 1.821690 1.438857 2.291659 3.140440 3.819994 4.207710
3 0.0 0.0 0.0 1.0 -18.154841 5.568890 0.135705 0.001051 0.000688 -0.196623 ... 0.399949 0.705356 1.127853 1.742820 3.274497 1.683010 2.766048 3.748502 4.303872 4.449241
4 1.0 0.0 0.0 1.0 395.985445 11192.658970 6.583700 0.099344 0.000000 8.608970 ... 0.165443 0.165443 0.165443 0.165443 0.706253 1.483926 2.279149 3.014130 3.525453 3.919983

5 rows × 4638 columns

Using tsfresh for forecasting#

You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

[11]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import make_reduction
from sktime.split import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = make_reduction(
    regressor, scitype="time-series-regressor", window_length=12
)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)

Generated using nbsphinx. The Jupyter notebook can be found here.