Overview of this notebook#
motivating example with modular building blocks
connecting distances, aligners, classifiers
pairwise transformers - the “type” of time series distances and kernels
time series alignment and alignment distances, e.g., time warping
composition patterns for distances, kernels, aligners
[1]:
import warnings
warnings.filterwarnings("ignore")
6.1 Motivating example#
Rich component relationships between object types!
many classifiers, regressors, clusterers use distances or kernels
distances and kernels are often composite, e.g., sum-of-distance, independent distance
TS distances are often based on scalar multivariate distances (e.g., Euclidean)
TS distances are often based on alignment, TS aligners are an estimator type!
aligners internally typically use scalar uni/multivariate distances
example:
1-nn using
sklearn
nearest neighborswith multivariate dynamic time warping distance, from
dtw-python
libraryon multivariate
"mahalanobis"
distance fromscipy
in
sktime
compatible interface, constructed from custom components
so, conceptually:
we build an sequence alignment algorithm (
dtw-python
) usingscipy
Mahalanobis distwe get the distance matrix computation from alignment algorithm
we use that distance matrix in
sklearn
knntogether this is a time series classifier!
[2]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist
# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis") # uses scipy distances
# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist) # uses dtw-python
# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner) # interface mutation to distance
# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist) # uses sklearn knn
[3]:
clf.get_params()
[3]:
{'algorithm': 'brute',
'distance': DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))),
'distance_mtype': None,
'distance_params': None,
'leaf_size': 30,
'n_jobs': None,
'n_neighbors': 1,
'pass_train_distances': False,
'weights': 'uniform',
'distance__aligner': AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis')),
'distance__aligner__dist_trafo': ScipyDist(metric='mahalanobis'),
'distance__aligner__open_begin': False,
'distance__aligner__open_end': False,
'distance__aligner__step_pattern': 'symmetric2',
'distance__aligner__window_type': 'none',
'distance__aligner__dist_trafo__colalign': 'intersect',
'distance__aligner__dist_trafo__metric': 'mahalanobis',
'distance__aligner__dist_trafo__metric_kwargs': None,
'distance__aligner__dist_trafo__p': 2,
'distance__aligner__dist_trafo__var_weights': None}
what are all the objects in this chain?
ScipyDist
- pairwise distance between scalars -transformer-pairwise
typeAlignerDtwFromDist
- time series alignment algorithm -aligner
typeDistFromAligner
- pairwise distance between time series -transformer-pairwise-panel
typeKNeighborsTimeSeriesClassifier
- time series classifier
[4]:
from sktime.registry import scitype
scitype(mw_aligner) # prints the type of estimator (as a string)
# same for other components
[4]:
'aligner'
let’s go through these - we’ve already seen classifiers.
6.2 Time series distances and kernels - pairwise panel transformers#
6.2.1 Distances, kernels - general interface#
pairwise panel transformers produce one distance per pair of series in the panel:
[5]:
from sktime.datasets import load_osuleaf
# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="numpy3D")
X1 = X[:3]
X2 = X[5:10]
[6]:
# constructing the transformer
from sktime.dists_kernels import FlatDist
from sktime.dists_kernels.scipy_dist import ScipyDist
# paired Euclidean distances, over time points
eucl_dist = FlatDist(ScipyDist())
[7]:
X1.shape
[7]:
(3, 1, 427)
[8]:
X2.shape
[8]:
(5, 1, 427)
X1 is panel with 3 series X2 is panel with 5 series
so a matrix of pairwise distances from X1 to X2 should have shape (3, 5)
[9]:
distmat = eucl_dist(X1, X2)
# alternatively, via the transform method
distmat = eucl_dist.transform(X1, X2)
distmat
[9]:
array([[29.94033435, 30.69443315, 29.02704475, 30.49413394, 29.77534229],
[28.86289916, 32.03165025, 29.6118973 , 32.95499251, 30.82017584],
[29.52672336, 18.76259726, 30.55213501, 15.93324954, 27.89072122]])
[10]:
distmat.shape
[10]:
(3, 5)
call or transform
with a single arg is the same as passing twice:
[11]:
distmat_symm = eucl_dist.transform(X1)
distmat_symm
[11]:
array([[ 0. , 24.58470308, 33.83913255],
[24.58470308, 0. , 35.44109497],
[33.83913255, 35.44109497, 0. ]])
pairwise panel transformers are scikit-learn
/ scikit-base
interface compatible and composable, like everything else in sktime
:
[12]:
eucl_dist.get_params()
[12]:
{'transformer': ScipyDist(),
'transformer__colalign': 'intersect',
'transformer__metric': 'euclidean',
'transformer__metric_kwargs': None,
'transformer__p': 2,
'transformer__var_weights': None}
6.2.2 Time series distances, kernels - composition#
pairwise transformers can be composed in a number of ways:
arithmetics, e.g., addition, multiplication - use dunder
+
,*
etc, orCombinedDistance
subset to one or multiple columns - use
my_dist[colnames]
dundersum or aggregate over univariate distance in multivariate panel, using
IndepDist
(also known as “independent distance”)compose with series-to-series transformers - use
*
dunder ormake_pipeline
[13]:
from sktime.datasets import load_basic_motions
# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape
[13]:
(3, 6, 100)
[14]:
# example 1: variable subsetting and arithmetic combinations
# we define *two* distances now
from sktime.dists_kernels import FlatDist, ScipyDist
# Euclidean distance (on flattened time series)
eucl_dist = FlatDist(ScipyDist())
# Mahalanobis distance (on flattened time series)
cos_dist = FlatDist(ScipyDist(metric="cosine"))
# arithmetic product of:
# * the Euclidean distance on gyrometer 2 time series
# * the Cosine distance on accelerometer 3 time series
prod_dist_42 = eucl_dist[4] * cos_dist[2]
prod_dist_42
[14]:
CombinedDistance(operation='*', pw_trafos=[PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist()), transformers=[ColumnSelect(columns=4)]), PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist(metric='cosine')), transformers=[ColumnSelect(columns=2)])])Please rerun this cell to show the HTML repr or trust the notebook.
CombinedDistance(operation='*', pw_trafos=[PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist()), transformers=[ColumnSelect(columns=4)]), PwTrafoPanelPipeline(pw_trafo=FlatDist(transformer=ScipyDist(metric='cosine')), transformers=[ColumnSelect(columns=2)])])
ScipyDist()
ColumnSelect(columns=4)
ScipyDist(metric='cosine')
ColumnSelect(columns=2)
[15]:
prod_dist_42(X)
[15]:
array([[0. , 1.87274896, 2.28712525],
[1.87274896, 0. , 2.62764453],
[2.28712525, 2.62764453, 0. ]])
[16]:
# example 2: independent dynamic time warping distance
from sktime.alignment.dtw_python import AlignerDTW
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.indep import IndepDist
# dynamic time warping distance - this is multivariate
dtw_dist = DistFromAligner(AlignerDTW())
# independent distance - by default IndepDist sums over univariate distances
indep_dtw_dist = IndepDist(dtw_dist)
# that is, this distance is arithmetic sum of
# * DTW distance on accelerometer 1 time series
# * DTW distance on accelerometer 2 time series
# * DTW distance on accelerometer 3 time series
# * DTW distance on gyrometer 1 time series
# * DTW distance on gyrometer 2 time series
# * DTW distance on gyrometer 3 time series
[17]:
indep_dtw_dist(X)
[17]:
array([[ 0. , 31.7765985 , 32.65822 ],
[31.7765985 , 0. , 39.78652033],
[32.65822 , 39.78652033, 0. ]])
[18]:
# example 3: dynamic time warping distance on first differences
from sktime.transformations.series.difference import Differencer
diff_dtw_distance = Differencer() * dtw_dist
[19]:
diff_dtw_distance(X)
[19]:
array([[ 0. , 20.622806, 27.731956],
[20.622806, 0. , 30.487498],
[27.731956, 30.487498, 0. ]])
some combinations may be available as efficient numba
based distances.
E.g., difference-then-dtw is available as the “fixed” sktime
native implementation DtwDist(derivative=True)
in sktime.dists_kernels.dtw
.
6.3 pairwise tabular transformers#
6.3.1 pairwise tabular transformers - general interface#
pairwise tabular transformers transform pairs of ordinary tabular data, e.g., plain pd.DataFrame
produce one distance per pair of rows
[20]:
from sktime.datatypes import get_examples
# we retrieve some DataFrame examples
X_tabular = get_examples("pd.DataFrame", "Series")[1]
X2_tabular = get_examples("pd.DataFrame", "Series")[1][0:3]
[21]:
# just an ordinary DataFrame, no time series
X_tabular
[21]:
a | b | |
---|---|---|
0 | 1.0 | 3.000000 |
1 | 4.0 | 7.000000 |
2 | 0.5 | 2.000000 |
3 | -3.0 | -0.428571 |
[22]:
X2_tabular
[22]:
a | b | |
---|---|---|
0 | 1.0 | 3.0 |
1 | 4.0 | 7.0 |
2 | 0.5 | 2.0 |
example: pairwise Euclidean distance between rows
[23]:
# constructing the transformer
from sktime.dists_kernels import ScipyDist
# mean of paired Euclidean distances
my_tabular_dist = ScipyDist(metric="euclidean")
[24]:
# obtain matrix of distances between each pair of rows in X_tabular, X2_tabular
my_tabular_dist(X_tabular, X2_tabular)
[24]:
array([[ 0. , 5. , 1.11803399],
[ 5. , 0. , 6.10327781],
[ 1.11803399, 6.10327781, 0. ],
[ 5.26831112, 10.20704039, 4.26004216]])
[25]:
# alternative call with transform:
my_tabular_dist.transform(X_tabular, X2_tabular)
[25]:
array([[ 0. , 5. , 1.11803399],
[ 5. , 0. , 6.10327781],
[ 1.11803399, 6.10327781, 0. ],
[ 5.26831112, 10.20704039, 4.26004216]])
[26]:
# as with pairwise panel transformers, one arg means second is the same
my_tabular_dist(X_tabular)
[26]:
array([[ 0. , 5. , 1.11803399, 5.26831112],
[ 5. , 0. , 6.10327781, 10.20704039],
[ 1.11803399, 6.10327781, 0. , 4.26004216],
[ 5.26831112, 10.20704039, 4.26004216, 0. ]])
6.3.2 constructing pairwise time series transformers from tabular ones#
“simple” time series distances can be obtained directly from tabular transformers:
flattening the time series to tabular, and then computing the distance -
FlatDist
aggregating the tabular distance matrix, from two individual time series -
AggrDist
these are important “baseline” distances!
Both can be used on sktime
pairwise transformers and sklearn
pairwise transformers.
the classes are called “dist” but all apply to kernels.
[27]:
from sktime.datasets import load_basic_motions
# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape
[27]:
(3, 6, 100)
[28]:
# example 1: flat Gaussian RBF kernel between time series
from sklearn.gaussian_process.kernels import RBF
from sktime.dists_kernels import FlatDist
flat_gaussian_tskernel = FlatDist(RBF(length_scale=10))
flat_gaussian_tskernel.get_params()
[28]:
{'transformer': RBF(length_scale=10),
'transformer__length_scale': 10,
'transformer__length_scale_bounds': (1e-05, 100000.0)}
[29]:
flat_gaussian_tskernel(X)
[29]:
array([[1. , 0.02267939, 0.28034066],
[0.02267939, 1. , 0.05447445],
[0.28034066, 0.05447445, 1. ]])
[30]:
# example 2: pairwise cosine distance - we've already seen FlatDist a couple times
from sktime.dists_kernels import FlatDist, ScipyDist
cos_tsdist = FlatDist(ScipyDist(metric="cosine"))
cos_tsdist.get_params()
[30]:
{'transformer': ScipyDist(metric='cosine'),
'transformer__colalign': 'intersect',
'transformer__metric': 'cosine',
'transformer__metric_kwargs': None,
'transformer__p': 2,
'transformer__var_weights': None}
[31]:
cos_tsdist(X)
[31]:
array([[1.11022302e-16, 1.36699314e+00, 6.99338545e-01],
[1.36699314e+00, 0.00000000e+00, 1.10061843e+00],
[6.99338545e-01, 1.10061843e+00, 0.00000000e+00]])
6.4 alignment algorithms, aka aligners#
“aligners” find a new index set for 2 or more time series so they become “similar”
new index set being a non-linear reparameterization of the old index sets
often, aligners also produce an overall distance between the two series
6.4.1 aligners - general interface#
aligner methods:
fit
- computes alignmentget_alignment
- returns reparametrized indices, also called “alignment path”get_aligned
returns reparametrized seriesget_distance
returns distance between the two aligned series - only available if"capability:get_distance"
let’s try to align two leaf contours from OSUleaf!
OSUleaf is a panel dataset with flattened tree leaf contours
instance = leaf
index (“time”) = angle from barycenter
variable = contour distance from barycenter at that angle
[32]:
from sktime.datasets import load_osuleaf
# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="pd-multiindex")
X1 = X.loc[0] # leaf 0
X2 = X.loc[1] # leaf 1
[33]:
from sktime.utils.plotting import plot_series
plot_series(X1, X2, labels=["leaf_1", "leaf_2"])
[33]:
(<Figure size 1600x400 with 1 Axes>, <Axes: >)
[34]:
from sktime.alignment.dtw_python import AlignerDTW
# use dtw-python package for aligning
# simple univariate alignment algorithm with default params
aligner = AlignerDTW()
[35]:
aligner.fit([X1, X2]) # series to align need to be passed as list
[35]:
AlignerDTW()Please rerun this cell to show the HTML repr or trust the notebook.
AlignerDTW()
[36]:
# alignment path
aligner.get_alignment()
# this aligns, e.g.:
# from row "2": aligns index 0 in X1 with index 2 of X2
# from row "664": aligns index 424 in X1 with index 423 of X2
[36]:
ind0 | ind1 | |
---|---|---|
0 | 0 | 0 |
1 | 0 | 1 |
2 | 0 | 2 |
3 | 1 | 2 |
4 | 2 | 3 |
... | ... | ... |
663 | 423 | 422 |
664 | 424 | 423 |
665 | 425 | 424 |
666 | 426 | 425 |
667 | 426 | 426 |
668 rows × 2 columns
[37]:
# obtain the aligned versions of the two series
X1_al, X2_al = aligner.get_aligned()
[38]:
from sktime.utils.plotting import plot_series
plot_series(
X1_al.reset_index(drop=True),
X2_al.reset_index(drop=True),
labels=["leaf_1", "leaf_2"],
)
[38]:
(<Figure size 1600x400 with 1 Axes>, <Axes: >)
the DTW aligner has a “distance” implemented
intuitively, it is a distance that sums distance after aligning, and the amount of stretch:
[39]:
# the AlignerDTW class (based on dtw-python) doesn't just align
# it also produces a distance
aligner.get_tags()
[39]:
{'python_dependencies_alias': {'dtw-python': 'dtw'},
'capability:multiple-alignment': False,
'capability:distance': True,
'capability:distance-matrix': True,
'python_dependencies': 'dtw-python'}
[40]:
# this is the distance between the two time series we aligned
aligner.get_distance()
[40]:
113.73231668301005
6.4.2 alignment based time series distances#
the DistFromAligner
wrapper simply computes distance per pair of aligned series.
This turns any aligner into a time series distance:
[41]:
from sktime.alignment.dtw_python import AlignerDTW
from sktime.dists_kernels.compose_from_align import DistFromAligner
# dynamic time warping distance - this is multivariate
dtw_dist = DistFromAligner(AlignerDTW())
[42]:
from sktime.datasets import load_osuleaf
# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="numpy3D")
X1 = X[:3]
X2 = X[5:10]
[43]:
dtw_distmat = dtw_dist(X1, X2)
dtw_distmat
[43]:
array([[165.25420136, 148.53521913, 159.93034065, 158.50379563,
155.98824527],
[153.5587322 , 151.52004769, 125.14570395, 183.97186106,
93.55389512],
[170.41354799, 154.24275848, 212.54601605, 66.59572457,
295.32544676]])
[44]:
dtw_distmat.shape
[44]:
(3, 5)
6.5 revisiting the initial example#
[45]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist
# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis") # uses scipy distances
# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist) # uses dtw-python
# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner) # interface mutation to distance
# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist) # uses sklearn knn
[46]:
clf
[46]:
KNeighborsTimeSeriesClassifier(distance=DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))))Please rerun this cell to show the HTML repr or trust the notebook.
KNeighborsTimeSeriesClassifier(distance=DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))))
ScipyDist(metric='mahalanobis')
we build an sequence alignment algorithm (
dtw-python
) usingscipy
Mahalanobis distwe get the distance matrix computation from alignment algorithm
we use that distance matrix in
sklearn
knntogether this is a time series classifier!
6.6 Searching for distances, kernels, transformers#
As with all sktime
objects, we can use the registry.all_estimators
utility to display all transformers in sktime
.
The relevant scitypes are:
"transformer-pairwise"
for all pairwise transformers on tabular data"transformer-panel"
for all pairwise transformers on panel data"aligner"
for all time series aligners"transformer"
for all transformers, these can be composed with all the aboev
[47]:
from sktime.registry import all_estimators
[48]:
# listing all pairwise panel transformers - distances, kernels on time series
all_estimators("transformer-pairwise-panel", as_dataframe=True)
[48]:
name | object | |
---|---|---|
0 | AggrDist | <class 'sktime.dists_kernels.compose_tab_to_pa... |
1 | CombinedDistance | <class 'sktime.dists_kernels.algebra.CombinedD... |
2 | ConstantPwTrafoPanel | <class 'sktime.dists_kernels.dummy.ConstantPwT... |
3 | DistFromAligner | <class 'sktime.dists_kernels.compose_from_alig... |
4 | DistFromKernel | <class 'sktime.dists_kernels.dist_to_kern.Dist... |
5 | DtwDist | <class 'sktime.dists_kernels.dtw.DtwDist'> |
6 | EditDist | <class 'sktime.dists_kernels.edit_dist.EditDist'> |
7 | FlatDist | <class 'sktime.dists_kernels.compose_tab_to_pa... |
8 | IndepDist | <class 'sktime.dists_kernels.indep.IndepDist'> |
9 | KernelFromDist | <class 'sktime.dists_kernels.dist_to_kern.Kern... |
10 | PwTrafoPanelPipeline | <class 'sktime.dists_kernels.compose.PwTrafoPa... |
11 | SignatureKernel | <class 'sktime.dists_kernels.signature_kernel.... |
[49]:
# listing all pairwise (tabular) transformers - distances, kernels on vectors/df-rows
all_estimators("transformer-pairwise", as_dataframe=True)
[49]:
name | object | |
---|---|---|
0 | ScipyDist | <class 'sktime.dists_kernels.scipy_dist.ScipyD... |
[50]:
# listing all alignment algorithms that can produce distances
all_estimators("aligner", as_dataframe=True, filter_tags={"capability:distance": True})
[50]:
name | object | |
---|---|---|
0 | AlignerDTW | <class 'sktime.alignment.dtw_python.AlignerDTW'> |
1 | AlignerDTWfromDist | <class 'sktime.alignment.dtw_python.AlignerDTW... |
2 | AlignerDtwNumba | <class 'sktime.alignment.dtw_numba.AlignerDtwN... |
6.7 Outlook, roadmap - panel tasks#
implementing estimators - distances, classifiers, etc
backend optimizations -
numba
, distributed/parallelsequence-to-sequence regression, classification
further maturing the time series alignment module
join and contribute!
6.8 Summary#
sktime
- modular framework for learning with time seriespanel data = collections of time series - tasks classification, regression, clustering
build flexible pipelines with transformers, tune via grid search etc
panel estimators typically rely on time series distances, kernels, aligners
TS distances, kernels, aligners can also be constructed in modular, flexible way
all objects above are first-class citiziens with
sklearn
-like interface!
Credits: notebook 6 - time series distances, kernels, alignment#
notebook creation: fkiraly
Generated using nbsphinx. The Jupyter notebook can be found here.