binder

Overview of this notebook#

  • motivating example with modular building blocks

    • connecting distances, aligners, classifiers

  • pairwise transformers - the “type” of time series distances and kernels

  • time series alignment and alignment distances, e.g., time warping

  • composition patterns for distances, kernels, aligners

[1]:
import warnings

warnings.filterwarnings("ignore")

6.1 Motivating example#

Rich component relationships between object types!

  • many classifiers, regressors, clusterers use distances or kernels

  • distances and kernels are often composite, e.g., sum-of-distance, independent distance

  • TS distances are often based on scalar multivariate distances (e.g., Euclidean)

  • TS distances are often based on alignment, TS aligners are an estimator type!

  • aligners internally typically use scalar uni/multivariate distances

example:

  • 1-nn using sklearn nearest neighbors

  • with multivariate dynamic time warping distance, from dtw-python library

  • on multivariate "mahalanobis" distance from scipy

  • in sktime compatible interface, constructed from custom components

so, conceptually:

  • we build an sequence alignment algorithm (dtw-python) using scipy Mahalanobis dist

  • we get the distance matrix computation from alignment algorithm

  • we use that distance matrix in sklearn knn

  • together this is a time series classifier!

[2]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist

# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis")  # uses scipy distances

# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist)  # uses dtw-python

# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner)  # interface mutation to distance

# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist)  # uses sklearn knn
[3]:
clf.get_params()
[3]:
{'algorithm': 'brute',
 'distance': DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))),
 'distance_mtype': None,
 'distance_params': None,
 'leaf_size': 30,
 'n_jobs': None,
 'n_neighbors': 1,
 'pass_train_distances': False,
 'weights': 'uniform',
 'distance__aligner': AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis')),
 'distance__aligner__dist_trafo': ScipyDist(metric='mahalanobis'),
 'distance__aligner__open_begin': False,
 'distance__aligner__open_end': False,
 'distance__aligner__step_pattern': 'symmetric2',
 'distance__aligner__window_type': 'none',
 'distance__aligner__dist_trafo__colalign': 'intersect',
 'distance__aligner__dist_trafo__metric': 'mahalanobis',
 'distance__aligner__dist_trafo__metric_kwargs': None,
 'distance__aligner__dist_trafo__p': 2,
 'distance__aligner__dist_trafo__var_weights': None}

what are all the objects in this chain?

  • ScipyDist - pairwise distance between scalars - transformer-pairwise type

  • AlignerDtwFromDist - time series alignment algorithm - aligner type

  • DistFromAligner- pairwise distance between time series - transformer-pairwise-panel type

  • KNeighborsTimeSeriesClassifier - time series classifier

[4]:
from sktime.registry import scitype

scitype(mw_aligner)  # prints the type of estimator (as a string)
# same for other components
[4]:
'aligner'

let’s go through these - we’ve already seen classifiers.

6.2 Time series distances and kernels - pairwise panel transformers#

6.2.1 Distances, kernels - general interface#

pairwise panel transformers produce one distance per pair of series in the panel:

[5]:
from sktime.datasets import load_osuleaf

# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="numpy3D")

X1 = X[:3]
X2 = X[5:10]
[6]:
# constructing the transformer
from sktime.dists_kernels import FlatDist
from sktime.dists_kernels.scipy_dist import ScipyDist

# paired Euclidean distances, over time points
eucl_dist = FlatDist(ScipyDist())
[7]:
X1.shape
[7]:
(3, 1, 427)
[8]:
X2.shape
[8]:
(5, 1, 427)

X1 is panel with 3 series X2 is panel with 5 series

so a matrix of pairwise distances from X1 to X2 should have shape (3, 5)

[9]:
distmat = eucl_dist(X1, X2)

# alternatively, via the transform method
distmat = eucl_dist.transform(X1, X2)
distmat
[9]:
array([[29.94033435, 30.69443315, 29.02704475, 30.49413394, 29.77534229],
       [28.86289916, 32.03165025, 29.6118973 , 32.95499251, 30.82017584],
       [29.52672336, 18.76259726, 30.55213501, 15.93324954, 27.89072122]])
[10]:
distmat.shape
[10]:
(3, 5)

call or transform with a single arg is the same as passing twice:

[11]:
distmat_symm = eucl_dist.transform(X1)
distmat_symm
[11]:
array([[ 0.        , 24.58470308, 33.83913255],
       [24.58470308,  0.        , 35.44109497],
       [33.83913255, 35.44109497,  0.        ]])

pairwise panel transformers are scikit-learn / scikit-base interface compatible and composable, like everything else in sktime:

[12]:
eucl_dist.get_params()
[12]:
{'transformer': ScipyDist(),
 'transformer__colalign': 'intersect',
 'transformer__metric': 'euclidean',
 'transformer__metric_kwargs': None,
 'transformer__p': 2,
 'transformer__var_weights': None}

6.2.2 Time series distances, kernels - composition#

pairwise transformers can be composed in a number of ways:

  • arithmetics, e.g., addition, multiplication - use dunder +, * etc, or CombinedDistance

  • subset to one or multiple columns - use my_dist[colnames] dunder

  • sum or aggregate over univariate distance in multivariate panel, using IndepDist (also known as “independent distance”)

  • compose with series-to-series transformers - use * dunder or make_pipeline

[13]:
from sktime.datasets import load_basic_motions

# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape
[13]:
(3, 6, 100)
[14]:
# example 1: variable subsetting and arithmetic combinations

# we define *two* distances now
from sktime.dists_kernels import FlatDist, ScipyDist

# Euclidean distance (on flattened time series)
eucl_dist = FlatDist(ScipyDist())
# Mahalanobis distance (on flattened time series)
cos_dist = FlatDist(ScipyDist(metric="cosine"))

# arithmetic product of:
# * the Euclidean distance on gyrometer 2 time series
# * the Cosine distance on accelerometer 3 time series
prod_dist_42 = eucl_dist[4] * cos_dist[2]
prod_dist_42
[14]:
CombinedDistance(operation='*',
                 pw_trafos=[PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist()),
                                                 transformers=[ColumnSelect(columns=4)]),
                            PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist(metric='cosine')),
                                                 transformers=[ColumnSelect(columns=2)])])
Please rerun this cell to show the HTML repr or trust the notebook.
[15]:
prod_dist_42(X)
[15]:
array([[0.        , 1.87274896, 2.28712525],
       [1.87274896, 0.        , 2.62764453],
       [2.28712525, 2.62764453, 0.        ]])
[16]:
# example 2: independent dynamic time warping distance
from sktime.alignment.dtw_python import AlignerDTW
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.indep import IndepDist

# dynamic time warping distance - this is multivariate
dtw_dist = DistFromAligner(AlignerDTW())

# independent distance - by default IndepDist sums over univariate distances
indep_dtw_dist = IndepDist(dtw_dist)

# that is, this distance is arithmetic sum of
# * DTW distance on accelerometer 1 time series
# * DTW distance on accelerometer 2 time series
# * DTW distance on accelerometer 3 time series
# * DTW distance on gyrometer 1 time series
# * DTW distance on gyrometer 2 time series
# * DTW distance on gyrometer 3 time series
[17]:
indep_dtw_dist(X)
[17]:
array([[ 0.        , 31.7765985 , 32.65822   ],
       [31.7765985 ,  0.        , 39.78652033],
       [32.65822   , 39.78652033,  0.        ]])
[18]:
# example 3: dynamic time warping distance on first differences
from sktime.transformations.series.difference import Differencer

diff_dtw_distance = Differencer() * dtw_dist
[19]:
diff_dtw_distance(X)
[19]:
array([[ 0.      , 20.622806, 27.731956],
       [20.622806,  0.      , 30.487498],
       [27.731956, 30.487498,  0.      ]])

some combinations may be available as efficient numba based distances.

E.g., difference-then-dtw is available as the “fixed” sktime native implementation DtwDist(derivative=True) in sktime.dists_kernels.dtw.

6.3 pairwise tabular transformers#

6.3.1 pairwise tabular transformers - general interface#

pairwise tabular transformers transform pairs of ordinary tabular data, e.g., plain pd.DataFrame

produce one distance per pair of rows

[20]:
from sktime.datatypes import get_examples

# we retrieve some DataFrame examples
X_tabular = get_examples("pd.DataFrame", "Series")[1]
X2_tabular = get_examples("pd.DataFrame", "Series")[1][0:3]
[21]:
# just an ordinary DataFrame, no time series
X_tabular
[21]:
a b
0 1.0 3.000000
1 4.0 7.000000
2 0.5 2.000000
3 -3.0 -0.428571
[22]:
X2_tabular
[22]:
a b
0 1.0 3.0
1 4.0 7.0
2 0.5 2.0

example: pairwise Euclidean distance between rows

[23]:
# constructing the transformer
from sktime.dists_kernels import ScipyDist

# mean of paired Euclidean distances
my_tabular_dist = ScipyDist(metric="euclidean")
[24]:
# obtain matrix of distances between each pair of rows in X_tabular, X2_tabular
my_tabular_dist(X_tabular, X2_tabular)
[24]:
array([[ 0.        ,  5.        ,  1.11803399],
       [ 5.        ,  0.        ,  6.10327781],
       [ 1.11803399,  6.10327781,  0.        ],
       [ 5.26831112, 10.20704039,  4.26004216]])
[25]:
# alternative call with transform:
my_tabular_dist.transform(X_tabular, X2_tabular)
[25]:
array([[ 0.        ,  5.        ,  1.11803399],
       [ 5.        ,  0.        ,  6.10327781],
       [ 1.11803399,  6.10327781,  0.        ],
       [ 5.26831112, 10.20704039,  4.26004216]])
[26]:
# as with pairwise panel transformers, one arg means second is the same
my_tabular_dist(X_tabular)
[26]:
array([[ 0.        ,  5.        ,  1.11803399,  5.26831112],
       [ 5.        ,  0.        ,  6.10327781, 10.20704039],
       [ 1.11803399,  6.10327781,  0.        ,  4.26004216],
       [ 5.26831112, 10.20704039,  4.26004216,  0.        ]])

6.3.2 constructing pairwise time series transformers from tabular ones#

“simple” time series distances can be obtained directly from tabular transformers:

  • flattening the time series to tabular, and then computing the distance - FlatDist

  • aggregating the tabular distance matrix, from two individual time series - AggrDist

these are important “baseline” distances!

Both can be used on sktime pairwise transformers and sklearn pairwise transformers.

the classes are called “dist” but all apply to kernels.

[27]:
from sktime.datasets import load_basic_motions

# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape
[27]:
(3, 6, 100)
[28]:
# example 1: flat Gaussian RBF kernel between time series
from sklearn.gaussian_process.kernels import RBF

from sktime.dists_kernels import FlatDist

flat_gaussian_tskernel = FlatDist(RBF(length_scale=10))
flat_gaussian_tskernel.get_params()
[28]:
{'transformer': RBF(length_scale=10),
 'transformer__length_scale': 10,
 'transformer__length_scale_bounds': (1e-05, 100000.0)}
[29]:
flat_gaussian_tskernel(X)
[29]:
array([[1.        , 0.02267939, 0.28034066],
       [0.02267939, 1.        , 0.05447445],
       [0.28034066, 0.05447445, 1.        ]])
[30]:
# example 2: pairwise cosine distance - we've already seen FlatDist a couple times
from sktime.dists_kernels import FlatDist, ScipyDist

cos_tsdist = FlatDist(ScipyDist(metric="cosine"))
cos_tsdist.get_params()
[30]:
{'transformer': ScipyDist(metric='cosine'),
 'transformer__colalign': 'intersect',
 'transformer__metric': 'cosine',
 'transformer__metric_kwargs': None,
 'transformer__p': 2,
 'transformer__var_weights': None}
[31]:
cos_tsdist(X)
[31]:
array([[1.11022302e-16, 1.36699314e+00, 6.99338545e-01],
       [1.36699314e+00, 0.00000000e+00, 1.10061843e+00],
       [6.99338545e-01, 1.10061843e+00, 0.00000000e+00]])

6.4 alignment algorithms, aka aligners#

  • “aligners” find a new index set for 2 or more time series so they become “similar”

  • new index set being a non-linear reparameterization of the old index sets

  • often, aligners also produce an overall distance between the two series

6.4.1 aligners - general interface#

aligner methods:

  • fit - computes alignment

  • get_alignment - returns reparametrized indices, also called “alignment path”

  • get_aligned returns reparametrized series

  • get_distance returns distance between the two aligned series - only available if "capability:get_distance"

let’s try to align two leaf contours from OSUleaf!

OSUleaf is a panel dataset with flattened tree leaf contours

  • instance = leaf

  • index (“time”) = angle from barycenter

  • variable = contour distance from barycenter at that angle

image0

[32]:
from sktime.datasets import load_osuleaf

# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="pd-multiindex")

X1 = X.loc[0]  # leaf 0
X2 = X.loc[1]  # leaf 1
[33]:
from sktime.utils.plotting import plot_series

plot_series(X1, X2, labels=["leaf_1", "leaf_2"])
[33]:
(<Figure size 1600x400 with 1 Axes>, <Axes: >)
../_images/examples_06_distances_kernels_alignment_60_1.png
[34]:
from sktime.alignment.dtw_python import AlignerDTW

# use dtw-python package for aligning
# simple univariate alignment algorithm with default params
aligner = AlignerDTW()
[35]:
aligner.fit([X1, X2])  # series to align need to be passed as list
[35]:
AlignerDTW()
Please rerun this cell to show the HTML repr or trust the notebook.
[36]:
# alignment path
aligner.get_alignment()

# this aligns, e.g.:
# from row "2": aligns index 0 in X1 with index 2 of X2
# from row "664": aligns index 424 in X1 with index 423 of X2
[36]:
ind0 ind1
0 0 0
1 0 1
2 0 2
3 1 2
4 2 3
... ... ...
663 423 422
664 424 423
665 425 424
666 426 425
667 426 426

668 rows × 2 columns

[37]:
# obtain the aligned versions of the two series
X1_al, X2_al = aligner.get_aligned()
[38]:
from sktime.utils.plotting import plot_series

plot_series(
    X1_al.reset_index(drop=True),
    X2_al.reset_index(drop=True),
    labels=["leaf_1", "leaf_2"],
)
[38]:
(<Figure size 1600x400 with 1 Axes>, <Axes: >)
../_images/examples_06_distances_kernels_alignment_65_1.png

the DTW aligner has a “distance” implemented

intuitively, it is a distance that sums distance after aligning, and the amount of stretch:

[39]:
# the AlignerDTW class (based on dtw-python) doesn't just align
# it also produces a distance
aligner.get_tags()
[39]:
{'python_dependencies_alias': {'dtw-python': 'dtw'},
 'capability:multiple-alignment': False,
 'capability:distance': True,
 'capability:distance-matrix': True,
 'python_dependencies': 'dtw-python'}
[40]:
# this is the distance between the two time series we aligned
aligner.get_distance()
[40]:
113.73231668301005

6.4.2 alignment based time series distances#

the DistFromAligner wrapper simply computes distance per pair of aligned series.

This turns any aligner into a time series distance:

[41]:
from sktime.alignment.dtw_python import AlignerDTW
from sktime.dists_kernels.compose_from_align import DistFromAligner

# dynamic time warping distance - this is multivariate
dtw_dist = DistFromAligner(AlignerDTW())
[42]:
from sktime.datasets import load_osuleaf

# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="numpy3D")

X1 = X[:3]
X2 = X[5:10]
[43]:
dtw_distmat = dtw_dist(X1, X2)
dtw_distmat
[43]:
array([[165.25420136, 148.53521913, 159.93034065, 158.50379563,
        155.98824527],
       [153.5587322 , 151.52004769, 125.14570395, 183.97186106,
         93.55389512],
       [170.41354799, 154.24275848, 212.54601605,  66.59572457,
        295.32544676]])
[44]:
dtw_distmat.shape
[44]:
(3, 5)

6.5 revisiting the initial example#

[45]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist

# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis")  # uses scipy distances

# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist)  # uses dtw-python

# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner)  # interface mutation to distance

# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist)  # uses sklearn knn
[46]:
clf
[46]:
KNeighborsTimeSeriesClassifier(distance=DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))))
Please rerun this cell to show the HTML repr or trust the notebook.
  • we build an sequence alignment algorithm (dtw-python) using scipy Mahalanobis dist

  • we get the distance matrix computation from alignment algorithm

  • we use that distance matrix in sklearn knn

  • together this is a time series classifier!

6.6 Searching for distances, kernels, transformers#

As with all sktime objects, we can use the registry.all_estimators utility to display all transformers in sktime.

The relevant scitypes are:

  • "transformer-pairwise" for all pairwise transformers on tabular data

  • "transformer-panel" for all pairwise transformers on panel data

  • "aligner" for all time series aligners

  • "transformer" for all transformers, these can be composed with all the aboev

[47]:
from sktime.registry import all_estimators
[48]:
# listing all pairwise panel transformers - distances, kernels on time series
all_estimators("transformer-pairwise-panel", as_dataframe=True)
[48]:
name object
0 AggrDist <class 'sktime.dists_kernels.compose_tab_to_pa...
1 CombinedDistance <class 'sktime.dists_kernels.algebra.CombinedD...
2 ConstantPwTrafoPanel <class 'sktime.dists_kernels.dummy.ConstantPwT...
3 DistFromAligner <class 'sktime.dists_kernels.compose_from_alig...
4 DistFromKernel <class 'sktime.dists_kernels.dist_to_kern.Dist...
5 DtwDist <class 'sktime.dists_kernels.dtw.DtwDist'>
6 EditDist <class 'sktime.dists_kernels.edit_dist.EditDist'>
7 FlatDist <class 'sktime.dists_kernels.compose_tab_to_pa...
8 IndepDist <class 'sktime.dists_kernels.indep.IndepDist'>
9 KernelFromDist <class 'sktime.dists_kernels.dist_to_kern.Kern...
10 PwTrafoPanelPipeline <class 'sktime.dists_kernels.compose.PwTrafoPa...
11 SignatureKernel <class 'sktime.dists_kernels.signature_kernel....
[49]:
# listing all pairwise (tabular) transformers - distances, kernels on vectors/df-rows
all_estimators("transformer-pairwise", as_dataframe=True)
[49]:
name object
0 ScipyDist <class 'sktime.dists_kernels.scipy_dist.ScipyD...
[50]:
# listing all alignment algorithms that can produce distances
all_estimators("aligner", as_dataframe=True, filter_tags={"capability:distance": True})
[50]:
name object
0 AlignerDTW <class 'sktime.alignment.dtw_python.AlignerDTW'>
1 AlignerDTWfromDist <class 'sktime.alignment.dtw_python.AlignerDTW...
2 AlignerDtwNumba <class 'sktime.alignment.dtw_numba.AlignerDtwN...

6.7 Outlook, roadmap - panel tasks#

  • implementing estimators - distances, classifiers, etc

  • backend optimizations - numba, distributed/parallel

  • sequence-to-sequence regression, classification

  • further maturing the time series alignment module

join and contribute!

6.8 Summary#

  • sktime - modular framework for learning with time series

  • panel data = collections of time series - tasks classification, regression, clustering

  • build flexible pipelines with transformers, tune via grid search etc

  • panel estimators typically rely on time series distances, kernels, aligners

  • TS distances, kernels, aligners can also be constructed in modular, flexible way

  • all objects above are first-class citiziens with sklearn-like interface!


Credits: notebook 6 - time series distances, kernels, alignment#

notebook creation: fkiraly


Generated using nbsphinx. The Jupyter notebook can be found here.