The Canonical Time-series Characteristics (catch22) transform#

catch22[1] is a collection of 22 time series features extracted from the 7000+ present in the hctsa [2][3] toolbox. A hierarchical clustering was performed on the correlation matrix of features that performed better than random chance to remove redundancy. These clusters were sorted by balanced accuracy using a decision tree classifier and a single feature was selected from the 22 clusters formed, taking into account balanced accuracy results, computational efficiency and interpretability.

In this notebook, we will demonstrate how to use the catch22 transformer on the ItalyPowerDemand univariate and BasicMotions multivariate datasets. We also show catch22 used for classification with a random forest classifier.

References:#

[1] Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., & Jones, N. S. (2019). catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery, 33(6), 1821-1852.

[2] Fulcher, B. D., & Jones, N. S. (2017). hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell systems, 5(5), 527-531.

[3] Fulcher, B. D., Little, M. A., & Jones, N. S. (2013). Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 20130048.

1. Imports#

[1]:

from sklearn import metrics

from sktime.classification.feature_based import Catch22Classifier
from sktime.datasets import load_basic_motions, load_italy_power_demand
from sktime.transformations.panel.catch22 import Catch22

2. Load data#

[2]:

IPD_X_train, IPD_y_train = load_italy_power_demand(split="train", return_X_y=True)
IPD_X_test, IPD_y_test = load_italy_power_demand(split="test", return_X_y=True)
IPD_X_test = IPD_X_test[:50]
IPD_y_test = IPD_y_test[:50]

print(IPD_X_train.shape, IPD_y_train.shape, IPD_X_test.shape, IPD_y_test.shape)

BM_X_train, BM_y_train = load_basic_motions(split="train", return_X_y=True)
BM_X_test, BM_y_test = load_basic_motions(split="test", return_X_y=True)

print(BM_X_train.shape, BM_y_train.shape, BM_X_test.shape, BM_y_test.shape)

(67, 1) (67,) (50, 1) (50,)
(40, 6) (40,) (40, 6) (40,)

3. catch22 transform#

Univariate#

The catch22 features are provided in the form of a transformer, Catch22. From this the transformed data can be used for a variety of time series analysis tasks.

[3]:

c22_uv = Catch22()
c22_uv.fit(IPD_X_train, IPD_y_train)

[3]:

Catch22()

Please rerun this cell to show the HTML repr or trust the notebook.

[4]:

transformed_data_uv = c22_uv.transform(IPD_X_train)
transformed_data_uv.head()

/opt/homebrew/Caskroom/miniforge/base/envs/sktime/lib/python3.9/site-packages/numba/cpython/hashing.py:482: UserWarning: FNV hashing is not implemented in Numba. See PEP 456 https://www.python.org/dev/peps/pep-0456/ for rationale over not using FNV. Numba will continue to work, but hashes for built in types will be computed using siphash24. This will permit e.g. dictionaries to continue to behave as expected, however anything relying on the value of the hash opposed to hash as a derived property is likely to not work as expected.
  warnings.warn(msg)

[4]:

	0	1	2	3	4	5	6	7	8	9	...	12	13	14	15	16	17	18	19	20	21
0	1.158630	-0.217227	8.0	0.291667	-0.625000	3.0	6.0	0.468052	0.589049	0.836755	...	3.0	1.000000	5.0	1.778748	0.750000	0.240598	NaN	NaN	0.040000	NaN
1	0.918162	-0.214762	15.0	0.208333	-0.666667	4.0	8.0	0.702775	0.196350	0.666160	...	4.0	0.869565	5.0	1.730238	0.500000	0.388217	NaN	NaN	0.111111	NaN
2	-0.273180	-0.085856	4.0	0.875000	0.250000	2.0	5.0	0.310567	0.589049	0.865073	...	2.0	0.913043	5.0	1.836012	0.666667	0.089104	NaN	NaN	0.034014	NaN
3	0.048411	-0.450080	13.0	0.166667	-0.625000	4.0	10.0	0.804047	0.196350	0.648309	...	4.0	0.869565	6.0	1.605420	0.666667	0.332436	NaN	NaN	0.111111	NaN
4	0.426379	0.572566	16.0	0.291667	-0.666667	4.0	7.0	0.675485	0.196350	0.657946	...	4.0	0.913043	6.0	1.730238	0.500000	0.318405	NaN	NaN	0.111111	NaN

5 rows × 22 columns

Please note, that Catch22 doesn’t take labels (y) into consideration in the fit(x, y=None) method, so we can easily replace it with a single-step fit_transform method.

[5]:

c22_uv_single_step = Catch22()
transformed_data_uv_single_step = c22_uv.fit_transform(IPD_X_train)
transformed_data_uv_single_step.equals(transformed_data_uv)

[5]:

True

Multivariate#

Transformation of multivariate data is supported by Catch22. The default procedure will concatenate each column prior to transformation.

[6]:

c22_mv = Catch22()
transformed_data_mv = c22_mv.fit_transform(BM_X_train)
transformed_data_mv.head()

[6]:

	dim_0__0	dim_0__1	dim_0__2	dim_0__3	dim_0__4	dim_0__5	dim_0__6	dim_0__7	dim_0__8	dim_0__9	...	dim_5__12	dim_5__13	dim_5__14	dim_5__15	dim_5__16	dim_5__17	dim_5__18	dim_5__19	dim_5__20	dim_5__21
0	-0.140988	-0.268073	6.0	-0.890	0.160	2.0	3.0	0.042638	0.736311	0.314500	...	2.0	0.707071	7.0	1.907929	1.00	0.658286	0.828571	0.228571	0.012550	9.0
1	-0.387256	-0.126246	6.0	-0.920	-0.600	2.0	4.0	0.269591	0.490874	0.614552	...	2.0	0.727273	6.0	1.875354	0.50	0.206944	0.600000	0.257143	0.028935	9.0
2	0.028412	-0.224988	9.0	-0.335	-0.045	1.0	3.0	0.036650	1.030835	0.352408	...	2.0	0.818182	7.0	1.789838	0.75	0.791912	0.828571	0.228571	0.054977	11.0
3	-0.147338	-0.199523	8.0	-0.540	0.180	1.0	5.0	0.013833	1.030835	0.212988	...	2.0	0.717172	6.0	1.904917	1.00	1.191592	0.600000	0.171429	0.015611	9.0
4	-0.217645	-0.252015	7.0	-0.130	0.020	1.0	6.0	0.008072	0.883573	0.150597	...	2.0	0.707071	7.0	1.880930	1.00	3.141568	0.800000	0.200000	0.002449	10.0

5 rows × 132 columns

We can also set specific column names, e.g., "short_str_feat" which will show short name of the feauture in the column name.

If the location and spread of the raw time-series distribution may be important, set catch24 = true to include additionally Mean and StandardDeviation values.

[7]:

c24_mv = Catch22(col_names="short_str_feat", catch24=True)
c24_mv.fit(BM_X_train)

[7]:

Catch22(catch24=True, col_names='short_str_feat')

Please rerun this cell to show the HTML repr or trust the notebook.

[8]:

c24_mv.transform(BM_X_train).head()

[8]:

	dim_0__mode_5	dim_0__mode_10	dim_0__stretch_decreasing	dim_0__outlier_timing_pos	dim_0__outlier_timing_neg	dim_0__acf_timescale	dim_0__acf_first_min	dim_0__centroid_freq	dim_0__low_freq_power	dim_0__forecast_error	...	dim_5__stretch_high	dim_5__rs_range	dim_5__whiten_timescale	dim_5__embedding_dist	dim_5__dfa	dim_5__rs_range	dim_5__transition_matrix	dim_5__periodicity	dim_5__mean	dim_5__std
0	-0.140988	-0.268073	6.0	-0.890	0.160	2.0	3.0	0.042638	0.736311	0.314500	...	7.0	1.907929	1.00	0.658286	0.828571	0.228571	0.012550	9.0	0.054413	0.510274
1	-0.387256	-0.126246	6.0	-0.920	-0.600	2.0	4.0	0.269591	0.490874	0.614552	...	6.0	1.875354	0.50	0.206944	0.600000	0.257143	0.028935	9.0	-0.102407	0.661172
2	0.028412	-0.224988	9.0	-0.335	-0.045	1.0	3.0	0.036650	1.030835	0.352408	...	7.0	1.789838	0.75	0.791912	0.828571	0.228571	0.054977	11.0	0.031881	0.499788
3	-0.147338	-0.199523	8.0	-0.540	0.180	1.0	5.0	0.013833	1.030835	0.212988	...	6.0	1.904917	1.00	1.191592	0.600000	0.171429	0.015611	9.0	0.029537	0.248161
4	-0.217645	-0.252015	7.0	-0.130	0.020	1.0	6.0	0.008072	0.883573	0.150597	...	7.0	1.880930	1.00	3.141568	0.800000	0.200000	0.002449	10.0	0.013344	0.163754

5 rows × 144 columns

4. catch22 Forest Classifier#

For classification tasks the default classifier to use with the catch22 features is random forest classifier. An implementation making use of the RandomForestClassifier from sklearn built on catch22 features is provided in the form on the Catch22Classifier for ease of use.

[9]:

c22f = Catch22Classifier(random_state=0)
c22f.fit(IPD_X_train, IPD_y_train)

[9]:

Catch22Classifier(random_state=0)

Please rerun this cell to show the HTML repr or trust the notebook.

[10]:

c22f_preds = c22f.predict(IPD_X_test)
print("C22F Accuracy: " + str(metrics.accuracy_score(IPD_y_test, c22f_preds)))

C22F Accuracy: 0.86

Generated using nbsphinx. The Jupyter notebook can be found here.