binder

MiniRocket#

MiniRocket transforms input time series using a small, fixed set of convolutional kernels. MiniRocket uses PPV pooling to compute a single feature for each of the resulting feature maps (i.e., the proportion of positive values). The transformed features are used to train a linear classifier.

Dempster A, Schmidt DF, Webb GI (2020) MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Series Classification arXiv:2012.08791

1 Univariate Time Series#

1.1 Imports#

Import example data, MiniRocket, RidgeClassifierCV (scikit-learn), and NumPy.

Note: MiniRocket and MiniRocketMultivariate are compiled by Numba on import. The compiled functions are cached, so this should only happen once (i.e., the first time you import MiniRocket or MiniRocketMultivariate).

[ ]:
# !pip install --upgrade numba
[ ]:
import numpy as np
from sklearn.linear_model import RidgeClassifierCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

from sktime.datasets import (
    load_arrow_head,  # univariate dataset
    load_basic_motions,  # multivariate dataset
    load_japanese_vowels,  # multivariate dataset with unequal length
)
from sktime.transformations.panel.rocket import (
    MiniRocket,
    MiniRocketMultivariate,
    MiniRocketMultivariateVariable,
)

1.2 Load the Training Data#

For more details on the data set, see the univariate time series classification notebook.

Note: Input time series must be at least of length 9. Pad shorter time series using, e.g., PaddingTransformer (sktime.transformers.panel.padder).

[ ]:
X_train, y_train = load_arrow_head(split="train", return_X_y=True)
# visualize the first univariate time series
X_train.iloc[0, 0].plot()

1.3 Initialise MiniRocket and Transform the Training Data#

[ ]:
minirocket = MiniRocket()  # by default, MiniRocket uses ~10_000 kernels
minirocket.fit(X_train)
X_train_transform = minirocket.transform(X_train)
# test shape of transformed training data -> (n_instances, 9_996)
X_train_transform.shape

1.4 Fit a Classifier#

We suggest using RidgeClassifierCV (scikit-learn) for smaller datasets (fewer than ~10,000 training examples), and using logistic regression trained using stochastic gradient descent for larger datasets.

Note: For larger datasets, this means integrating MiniRocket with stochastic gradient descent such that the transform is performed per minibatch, not simply substituting RidgeClassifierCV for, e.g., LogisticRegression.

Note: While the input time-series of MiniRocket is unscaled, the output features of MiniRocket may need to be adjusted for following models. E.g. for RidgeClassifierCV, we scale the features using the sklearn StandardScaler.

[ ]:
scaler = StandardScaler(with_mean=False)
classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))

X_train_scaled_transform = scaler.fit_transform(X_train_transform)
classifier.fit(X_train_scaled_transform, y_train)

1.5 Load and Transform the Test Data#

[ ]:
X_test, y_test = load_arrow_head(split="test", return_X_y=True)
X_test_transform = minirocket.transform(X_test)

1.6 Classify the Test Data#

[ ]:
X_test_scaled_transform = scaler.transform(X_test_transform)
classifier.score(X_test_scaled_transform, y_test)

2 Multivariate Time Series#

We can use the multivariate version of MiniRocket for multivariate time series input.

2.1 Imports#

Import MiniRocketMultivariate.

Note: MiniRocketMultivariate compiles via Numba on import.

[ ]:

2.2 Load the Training Data#

Note: Input time series must be at least of length 9. Pad shorter time series using, e.g., PaddingTransformer (sktime.transformers.panel.padder).

[ ]:
X_train, y_train = load_basic_motions(split="train", return_X_y=True)

2.3 Initialise MiniRocket and Transform the Training Data#

[ ]:
minirocket_multi = MiniRocketMultivariate()
minirocket_multi.fit(X_train)
X_train_transform = minirocket_multi.transform(X_train)

2.4 Fit a Classifier#

[ ]:
scaler = StandardScaler(with_mean=False)
X_train_scaled_transform = scaler.fit_transform(X_train_transform)

classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))
classifier.fit(X_train_scaled_transform, y_train)

2.5 Load and Transform the Test Data#

[ ]:
X_test, y_test = load_basic_motions(split="test", return_X_y=True)
X_test_transform = minirocket_multi.transform(X_test)

2.6 Classify the Test Data#

[ ]:
X_test_scaled_transform = scaler.transform(X_test_transform)
classifier.score(X_test_scaled_transform, y_test)

3 Pipeline Example#

We can use MiniRocket together with RidgeClassifierCV (or another classifier) in a pipeline. We can then use the pipeline like a self-contained classifier, with a single call to fit, and without having to separately transform the data, etc.

3.1 Imports#

[ ]:
# (above)

3.2 Initialise the Pipeline#

[ ]:
minirocket_pipeline = make_pipeline(
    MiniRocket(),
    StandardScaler(with_mean=False),
    RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),
)

3.3 Load and Fit the Training Data#

Note: Input time series must be at least of length 9. Pad shorter time series using, e.g., PaddingTransformer (sktime.transformers.panel.padder).

[ ]:
X_train, y_train = load_arrow_head(split="train")

# it is necessary to pass y_train to the pipeline
# y_train is not used for the transform, but it is used by the classifier
minirocket_pipeline.fit(X_train, y_train)

3.4 Load and Classify the Test Data#

[ ]:
X_test, y_test = load_arrow_head(split="test")

minirocket_pipeline.score(X_test, y_test)

4 Pipeline Example with MiniRocketMultivariateVariable and unequal length time-series data#

For a further pipeline, we use the extended version of MiniRocket, the MiniRocketMultivariateVariable for variable / unequal length time series data. Following the code implementation of the original paper of miniRocket, we combine it with RidgeClassifierCV in a sklearn pipeline. We can then use the pipeline like a self-contained classifier, with a single call to fit, and without having to separately transform the data, etc.

4.1 Load japanese_vowels as unequal length dataset#

Japanese vowels is a a UCI Archive dataset. 9 Japanese-male speakers were recorded saying the vowels ‘a’ and ‘e’. The raw recordings are preprocessed to get a 12-dimensional (multivariate) classification problem. The series lengths are between 7 and 29.

[ ]:
X_train_jv, y_train_jv = load_japanese_vowels(split="train", return_X_y=True)
# lets visualize the first three voice recordings with dimension 0-11

print("number of samples training: ", X_train_jv.shape[0])
print("series length of recoding 0, dimension 5: ", X_train_jv.iloc[0, 5].shape)
print("series length of recoding 1, dimension 5: ", X_train_jv.iloc[1, 0].shape)

X_train_jv.head(3)
[ ]:
# additional visualizations
number_example = 153
for i in range(12):
    X_train_jv.loc[number_example, f"dim_{i}"].plot()
print("Speaker ID: ", y_train_jv[number_example])

4.2 Create a pipeline, train on it#

As before, we create a sklearn pipeline. MiniRocketMultivariateVariable requires a minimum series length of 9, where missing values are padded up to a length of 9, with the value “-10.0”. Afterwards a scaler and a RidgeClassifierCV are added.

[ ]:
minirocket_mv_var_pipeline = make_pipeline(
    MiniRocketMultivariateVariable(
        pad_value_short_series=-10.0, random_state=42, max_dilations_per_kernel=16
    ),
    StandardScaler(with_mean=False),
    RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),
)
print(minirocket_mv_var_pipeline)

minirocket_mv_var_pipeline.fit(X_train_jv, y_train_jv)

4.3 Score the Pipeline on japanese vowels#

Using the MiniRocketMultivariateVariable, we are able to process also process slightly larger input series than at train time. train max series length: 27, test max series length 29

[ ]:
X_test_jv, y_test_jv = load_japanese_vowels(split="test", return_X_y=True)

minirocket_mv_var_pipeline.score(X_test_jv, y_test_jv)

Generated using nbsphinx. The Jupyter notebook can be found here.