Feature extraction with tsfresh transformer#
In this tutorial, we show how you can use sktime with tsfresh to first extract features from time series, so that we can then use any scikit-learn estimator.
Preliminaries#
You have to install tsfresh if you haven’t already. To install it, uncomment the cell below:
[1]:
# !pip install --upgrade tsfresh
[2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor
Univariate time series classification data#
For more details on the data set, see the univariate time series classification notebook.
[3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
[4]:
X_train.head()
[4]:
dim_0 | |
---|---|
69 | 0 -1.7998 1 -1.7987 2 -1.7942 3 ... |
103 | 0 -1.8091 1 -1.8067 2 -1.7866 3 ... |
34 | 0 -2.0417 1 -2.0572 2 -2.0522 3 ... |
14 | 0 -2.1888 1 -2.1855 2 -2.1765 3 ... |
121 | 0 -1.9586 1 -1.9371 2 -1.8798 3 ... |
[5]:
# binary classification task
np.unique(y_train)
[5]:
array(['0', '1', '2'], dtype=object)
Using tsfresh to extract features#
[6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00, 2.05s/it]
[6]:
dim_0__variance_larger_than_standard_deviation | dim_0__has_duplicate_max | dim_0__has_duplicate_min | dim_0__has_duplicate | dim_0__sum_values | dim_0__abs_energy | dim_0__mean_abs_change | dim_0__mean_change | dim_0__mean_second_derivative_central | dim_0__median | ... | dim_0__fourier_entropy__bins_2 | dim_0__fourier_entropy__bins_3 | dim_0__fourier_entropy__bins_5 | dim_0__fourier_entropy__bins_10 | dim_0__fourier_entropy__bins_100 | dim_0__permutation_entropy__dimension_3__tau_1 | dim_0__permutation_entropy__dimension_4__tau_1 | dim_0__permutation_entropy__dimension_5__tau_1 | dim_0__permutation_entropy__dimension_6__tau_1 | dim_0__permutation_entropy__dimension_7__tau_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000080 | 249.998516 | 0.052357 | -0.000001 | -0.000005 | -0.024066 | ... | 0.046288 | 0.092513 | 0.092513 | 0.092513 | 0.250609 | 1.323194 | 1.819631 | 2.183824 | 2.463220 | 2.707387 |
1 | 0.0 | 0.0 | 1.0 | 1.0 | -0.000525 | 250.000756 | 0.049118 | 0.000000 | -0.000006 | -0.031622 | ... | 0.046288 | 0.046288 | 0.092513 | 0.092513 | 0.184769 | 1.213529 | 1.668744 | 2.081159 | 2.418614 | 2.707518 |
2 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000034 | 249.998998 | 0.069971 | 0.000084 | 0.000025 | 0.018880 | ... | 0.081510 | 0.092513 | 0.092513 | 0.138673 | 0.311663 | 1.116706 | 1.545256 | 1.889777 | 2.155644 | 2.374722 |
3 | 0.0 | 0.0 | 0.0 | 1.0 | 0.000202 | 249.999702 | 0.067601 | -0.000002 | -0.000010 | 0.384770 | ... | 0.046288 | 0.092513 | 0.092513 | 0.204643 | 0.414263 | 1.323315 | 1.915330 | 2.406197 | 2.794719 | 3.117007 |
4 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000146 | 249.998674 | 0.050355 | -0.000004 | -0.000046 | -0.045353 | ... | 0.046288 | 0.092513 | 0.092513 | 0.092513 | 0.230801 | 1.173933 | 1.628543 | 2.003443 | 2.303091 | 2.559695 |
5 rows × 773 columns
Using tsfresh with sktime#
[7]:
classifier = make_pipeline(
TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:11<00:00, 2.21s/it]
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00, 1.45it/s]
[7]:
0.8490566037735849
Multivariate time series classification data#
[8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
[9]:
# multivariate input data
X_train.head()
[9]:
dim_0 | dim_1 | dim_2 | dim_3 | dim_4 | dim_5 | |
---|---|---|---|---|---|---|
20 | 0 -0.294498 1 -0.294498 2 -0.050044 3... | 0 0.540218 1 0.540218 2 -0.515245 3... | 0 0.218114 1 0.218114 2 -0.301108 3... | 0 -0.045277 1 -0.045277 2 0.103872 3... | 0 -0.002663 1 -0.002663 2 -0.183773 3... | 0 0.031960 1 0.031960 2 0.037287 3... |
26 | 0 -0.761604 1 -0.761604 2 0.121078 3... | 0 0.260125 1 0.260125 2 -1.423255 3... | 0 -0.064487 1 -0.064487 2 0.075600 3... | 0 0.069248 1 0.069248 2 -0.282318 3... | 0 0.242367 1 0.242367 2 -0.332922 3... | 0 -0.007990 1 -0.007990 2 0.239704 3... |
7 | 0 -0.352746 1 -0.352746 2 -1.354561 3... | 0 0.316845 1 0.316845 2 0.490525 3... | 0 -0.473779 1 -0.473779 2 1.454261 3... | 0 -0.327595 1 -0.327595 2 -0.269001 3... | 0 0.106535 1 0.106535 2 0.021307 3... | 0 0.197090 1 0.197090 2 0.460763 3... |
8 | 0 -0.342233 1 -0.342233 2 -0.298542 3... | 0 0.327415 1 0.327415 2 -0.527154 3... | 0 0.157229 1 0.157229 2 0.248585 3... | 0 0.394179 1 0.394179 2 -0.037287 3... | 0 0.074574 1 0.074574 2 -0.087891 3... | 0 -0.037287 1 -0.037287 2 -0.050604 3... |
10 | 0 0.206148 1 0.206148 2 6.53436... | 0 -0.658294 1 -0.658294 2 4.597327 3... | 0 0.469612 1 0.469612 2 -2.723661 3... | 0 -0.106535 1 -0.106535 2 -0.439456 3... | 0 0.306288 1 0.306288 2 1.717875 3... | 0 0.950824 1 0.950824 2 -1.041379 3... |
[10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:18<00:00, 3.69s/it]
[10]:
dim_0__variance_larger_than_standard_deviation | dim_0__has_duplicate_max | dim_0__has_duplicate_min | dim_0__has_duplicate | dim_0__sum_values | dim_0__abs_energy | dim_0__mean_abs_change | dim_0__mean_change | dim_0__mean_second_derivative_central | dim_0__median | ... | dim_5__fourier_entropy__bins_2 | dim_5__fourier_entropy__bins_3 | dim_5__fourier_entropy__bins_5 | dim_5__fourier_entropy__bins_10 | dim_5__fourier_entropy__bins_100 | dim_5__permutation_entropy__dimension_3__tau_1 | dim_5__permutation_entropy__dimension_4__tau_1 | dim_5__permutation_entropy__dimension_5__tau_1 | dim_5__permutation_entropy__dimension_6__tau_1 | dim_5__permutation_entropy__dimension_7__tau_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 1.0 | 33.334188 | 110.735119 | 0.822452 | 0.000639 | 0.001751 | 0.164096 | ... | 0.165443 | 0.165443 | 0.165443 | 0.192626 | 0.545824 | 1.279774 | 1.910772 | 2.565051 | 3.096812 | 3.567632 |
1 | 1.0 | 0.0 | 0.0 | 1.0 | 73.888480 | 220.949429 | 0.964075 | -0.002087 | -0.003908 | 0.613719 | ... | 0.096509 | 0.096509 | 0.261160 | 0.261160 | 0.451359 | 1.313299 | 1.987599 | 2.593635 | 3.173890 | 3.696247 |
2 | 0.0 | 0.0 | 0.0 | 1.0 | -17.428760 | 7.940863 | 0.170422 | 0.002326 | -0.000244 | -0.152038 | ... | 0.223718 | 0.261160 | 0.356468 | 0.545824 | 1.821690 | 1.438857 | 2.291659 | 3.140440 | 3.819994 | 4.207710 |
3 | 0.0 | 0.0 | 0.0 | 1.0 | -18.154841 | 5.568890 | 0.135705 | 0.001051 | 0.000688 | -0.196623 | ... | 0.399949 | 0.705356 | 1.127853 | 1.742820 | 3.274497 | 1.683010 | 2.766048 | 3.748502 | 4.303872 | 4.449241 |
4 | 1.0 | 0.0 | 0.0 | 1.0 | 395.985445 | 11192.658970 | 6.583700 | 0.099344 | 0.000000 | 8.608970 | ... | 0.165443 | 0.165443 | 0.165443 | 0.165443 | 0.706253 | 1.483926 | 2.279149 | 3.014130 | 3.525453 | 3.919983 |
5 rows × 4638 columns
Using tsfresh for forecasting#
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.
[11]:
from sklearn.ensemble import RandomForestRegressor
from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import make_reduction
from sktime.split import temporal_train_test_split
y = load_airline()
y_train, y_test = temporal_train_test_split(y)
regressor = make_pipeline(
TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
RandomForestRegressor(),
)
forecaster = make_reduction(
regressor, scitype="time-series-regressor", window_length=12
)
forecaster.fit(y_train)
fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)
Generated using nbsphinx. The Jupyter notebook can be found here.