Dictionary based time series classification in sktime#

Dictionary based approaches adapt the bag of words model commonly used in signal processing, computer vision and audio processing for time series classification. Dictionary based classifiers have the same broad structure. A sliding window of length \(w\) is run across a series. For each window, the real valued series of length \(w\) is converted through approximation and discretisation processes into a symbolic string of length \(l\), which consists of \(\alpha\) possible letters. The occurrence in a series of each ‘word’ from the dictionary defined by \(l\) and \(\alpha\) is counted, and once the sliding window has completed the series is transformed into a histogram. Classification is based on the histograms of the words extracted from the series, rather than the raw data.

Currently 4 univeriate dictionary based classifiers are implemented in sktime, all making use of the Symbolic Fourier Approximation (SFA)[1] transform to discretise into words. These are the Bag of SFA Symbols (BOSS)[2], the Contractable Bag of SFA Symbols (cBOSS)[3], Word Extraction for Time Series Classification (WEASEL)[4] and the Temporal Dictionary Ensemble (TDE)[5]. WEASEL has a multivariate extension called MUSE[7] and TDE has multivariate capabilities.

In this notebook, we will demonstrate how to use BOSS, cBOSS, WEASEL and TDE on the ItalyPowerDemand and BasicMotions datasets.

References:#

[1] Schäfer, P., & Högqvist, M. (2012). SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In Proceedings of the 15th International Conference on Extending Database Technology (pp. 516-527).

[2] Schäfer, P. (2015). The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery, 29(6), 1505-1530.

[3] Middlehurst, M., Vickers, W., & Bagnall, A. (2019). Scalable dictionary classifiers for time series classification. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 11-19). Springer, Cham.

[4] Schäfer, P., & Leser, U. (2017). Fast and accurate time series classification with WEASEL. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 637-646).

[5] Middlehurst, M., Large, J., Cawley, G., & Bagnall, A. (2020). The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification. In The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.

[6] Large, J., Bagnall, A., Malinowski, S., & Tavenard, R. (2019). On time series classification with dictionary-based classifiers. Intelligent Data Analysis, 23(5), 1073-1089.

[7] Schäfer, P., & Leser, U. (2018). Multivariate time series classification with WEASEL+MUSE. 3rd ECML/PKDD Workshop on AALTD.

1. Imports#

[1]:

from sklearn import metrics

from sktime.classification.dictionary_based import (
    MUSE,
    WEASEL,
    BOSSEnsemble,
    ContractableBOSS,
    TemporalDictionaryEnsemble,
)
from sktime.datasets import load_basic_motions, load_italy_power_demand

2. Load data#

[2]:

X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
X_test = X_test[:50]
y_test = y_test[:50]

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

X_train_mv, y_train_mv = load_basic_motions(split="train", return_X_y=True)
X_test_mv, y_test_mv = load_basic_motions(split="test", return_X_y=True)

X_train_mv = X_train_mv[:20]
y_train_mv = y_train_mv[:20]
X_test_mv = X_test_mv[:20]
y_test_mv = y_test_mv[:20]

print(X_train_mv.shape, y_train_mv.shape, X_test_mv.shape, y_test_mv.shape)

(67, 1) (67,) (50, 1) (50,)
(20, 6) (20,) (20, 6) (20,)

3. Bag of SFA Symbols (BOSS)#

BOSS is an ensemble of individual BOSS classifiers making use of the SFA transform. The classifier performs grid-search through a large number of individual classifiers for parameters \(l\), \(\alpha\), \(w\) and \(p\) (normalise each window). Of the classifiers searched only those within 92% accuracy of the best classifier are retained. Individual BOSS classifiers use a non-symmetric distance function, BOSS distance, in conjunction with a nearest neighbour classifier.

As tuning is handled inside the classifier, BOSS has very little parameters to be altered and generally should be run using default settings.

[3]:

boss = BOSSEnsemble(random_state=47)
boss.fit(X_train, y_train)

boss_preds = boss.predict(X_test)
print("BOSS Accuracy: " + str(metrics.accuracy_score(y_test, boss_preds)))

BOSS Accuracy: 0.94

4. Contractable BOSS (cBOSS)#

cBOSS significantly speeds up BOSS with no significant difference in accuracy by improving how the ensemble is formed. cBOSS utilises a filtered random selection of parameters to find its ensemble members. Each ensemble member is built on a 70% subsample of the train data, using random sampling without replacement. An exponential weighting scheme for the predictions of the base classifiers is introduced.

A new parameter for the number of parameters samples \(k\) is introduced. of which the top \(s\) (max ensemble size) with the highest accuracy are kept for the final ensemble. The \(k\) parameter is replaceable with a time limit \(t\) through contracting.

[4]:

# Recommended non-contract cBOSS parameters
cboss = ContractableBOSS(n_parameter_samples=250, max_ensemble_size=50, random_state=47)

# cBOSS with a 1 minute build time contract
# cboss = ContractableBOSS(time_limit_in_minutes=1,
#                         max_ensemble_size=50,
#                         random_state=47)

cboss.fit(X_train, y_train)

cboss_preds = cboss.predict(X_test)
print("cBOSS Accuracy: " + str(metrics.accuracy_score(y_test, cboss_preds)))

cBOSS Accuracy: 0.96

5. Word Extraction for Time Series Classification (WEASEL)#

WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set than BOSS. It extends SFA by bigrams, feature selection using Anova-f-test and Information Gain Binning (IGB).

Univariate#

[5]:

weasel = WEASEL(binning_strategy="equi-depth", anova=False, random_state=47)
weasel.fit(X_train, y_train)

weasel_preds = weasel.predict(X_test)
print("WEASEL Accuracy: " + str(metrics.accuracy_score(y_test, weasel_preds)))

WEASEL Accuracy: 0.96

Multivariate#

WEASEL+MUSE (Multivariate Symbolic Extension) is the multivariate extension of WEASEL.

[6]:

muse = MUSE()
muse.fit(X_train_mv, y_train_mv)

muse_preds = muse.predict(X_test_mv)
print("MUSE Accuracy: " + str(metrics.accuracy_score(y_test_mv, muse_preds)))

MUSE Accuracy: 1.0

6. Temporal Dictionary Ensemble (TDE)#

TDE aggregates the best components of 3 classifiers extending from the original BOSS algorithm. The ensemble structure and improvements of cBOSS[3] are used; Spatial pyramids are introduced from Spatial BOSS (S-BOSS)[6]; From Word Extraction for Time Series Classification (WEASEL)[4] bigrams and Information Gain Binning (IGB), a replacement for the multiple coefficient binning (MCB) used by SFA, are included. Two new parameters are included in the ensemble parameter search, the number of spatial pyramid levels \(h\) and whether to use IGB or MCB \(b\). A Gaussian processes regressor is used to select new parameter sets to evaluate for the ensemble, predicting the accuracy of a set of parameter values using past classifier performances.

Inheriting the cBOSS ensemble structure, the number of parameter samples \(k\), time limit \(t\) and max ensemble size \(s\) remain as parameters to be set accounting for memory and time requirements.

Univariate#

[7]:

# Recommended non-contract TDE parameters
tde_u = TemporalDictionaryEnsemble(
    n_parameter_samples=50,
    max_ensemble_size=50,
    randomly_selected_params=50,
    random_state=47,
)

# TDE with a 1 minute build time contract
# tde = TemporalDictionaryEnsemble(time_limit_in_minutes=1,
#                                 max_ensemble_size=50,
#                                 randomly_selected_params=50,
#                                 random_state=47)

tde_u.fit(X_train, y_train)

tde_u_preds = tde_u.predict(X_test)
print("TDE Accuracy: " + str(metrics.accuracy_score(y_test, tde_u_preds)))

TDE Accuracy: 1.0

Multivariate#

[8]:

# Recommended non-contract TDE parameters
tde_mv = TemporalDictionaryEnsemble(
    n_parameter_samples=50,
    max_ensemble_size=50,
    randomly_selected_params=50,
    random_state=47,
)

# TDE with a 1 minute build time contract
# tde_m = TemporalDictionaryEnsemble(time_limit_in_minutes=1,
#                                 max_ensemble_size=50,
#                                 randomly_selected_params=50,
#                                 random_state=47)

tde_mv.fit(X_train_mv, y_train_mv)

tde_mv_preds = tde_mv.predict(X_test_mv)
print("TDE Accuracy: " + str(metrics.accuracy_score(y_test_mv, tde_mv_preds)))

TDE Accuracy: 1.0

Generated using nbsphinx. The Jupyter notebook can be found here.