dtw_distance(x: ndarray, y: ndarray, window: float | None = None, itakura_max_slope: float | None = None, bounding_matrix: ndarray = None, **kwargs: Any) float[source]#

Compute the dynamic time warping (DTW) distance between two time series.

Originally proposed in [1], DTW is an elastic distance measure, i.e., it is a distance computed after realigning (warping) two time series to best match each other via time axis distortions [2].

This function computes time warping distances only for: * sequences, time index is ignored * two time series of equal length * the Euclidean pairwise distance

For unequal length time series, use sktime.dists_kernels.DistFromAligner with a time warping aligner such as sktime.aligners.AlignerDTW. To use arbitrary pairwise distances, use sktime.aligners.AlignerDTWfromDist.

Mathematically, for two sequences :math:’mathbf{a}={a_1,a_2,ldots,a_m}’ and :math:’mathbf{b}={b_1,b_2,ldots, b_n}’, (assumed equal length for simplicity), DTW first calculates the pairwise distance matrix :math:’M( mathbf{a},mathbf{b})’, the :math:’m times n’, between series :math:’mathbf{a}’ and :math:’mathbf{b}’, where :math:’M_{i,j} = d(a_i, b_j)’, for a chosen distance measure \(d: \mathbb{R}^h \times \mathbb{R}^h \rightarrow \mathbb{R}\). In this estimator, the squared Euclidean distance is used, i.e., \(d(x, y):= (x-y)^2\). A warping path .. math:: P=((i_1, j_1), (i_2, j_2), ldots, (i_s, j_s)) is an ordered tuple of indices \(i_k \in \{1, \dots, m\}, j_k \in \{1, \dots, n\}\) which define a traversal path of matrix :math:’M’. This implementation assumes for warping paths that: * closed paths: \(i_1 = j_1 = 1\); \(i_s = m, j_s = n\) * monotonous paths: \(i_k \le i_{k+1}, j_k \le j_{k+1}\) for all \(k\) * strictly monotonous paths: \((i_k, j_k) \neq (i_{k+1}, j_{k+1})\) for all \(k\) The DTW path between sequences is the path through :math:’M’ that minimizes the total distance, over all valid paths (satisfying the above assumptions), given the sequences. Formally: The distance for a warping path :math:’P’ of length :math:’s’ is .. math:: D_P(mathbf{a},mathbf{b}) = sum_{k=1}^s M_{i_k,j_k}. If :math:’mathcal{P}’ is the set of all possible paths, the DTW path :math:’P^*’ is the path that has the minimum distance amongst those: .. math:: P^* = argmin_{Pin mathcal{P}} D_{P}(mathbf{a},mathbf{b}). The DTW distance between the two sequences :math:’mathbf{a},mathbf{b}’ is the minimum warping path distance: .. math:: d_{dtw}(mathbf{a}, mathbf{b}) = min_{Pin mathcal{P}} D_{P}(mathbf{a},mathbf{b}) = D_{P^*}(mathbf{a},mathbf{b}). The optimal warping path $P^*$ can be found exactly through dynamic programming. This can be a time consuming operation, and it is common to put a restriction on the amount of warping allowed. This is implemented through the bounding_matrix structure, that restricts allowable warpings by a mask. Common bounding strategies include the Sakoe-Chiba band [R93974dace1e5-3] and the Itakura parallelogram [4_]. The Sakoe-Chiba band creates a warping path window that has the same width along the diagonal of :math:’M’. The Itakura paralleogram allows for less warping at the start or end of the sequence than in the middle.

If the function is called with multivariate time series, note that the matrix :math:’M’ is computed with the multivariate squared Euclidean distance, \(d(x, y):= (x-y)^2\) = sum_{i=1}^h (x_i - y_i)^2` This is sometimes called the “dependent” version of DTW, DTW_D, see [R93974dace1e5-5].

x: np.ndarray (1d or 2d array)

First time series.

y: np.ndarray (1d or 2d array)

Second time series.

window: float, defaults = None

Float that is the radius of the sakoe chiba window (if using Sakoe-Chiba lower bounding). Value must be between 0. and 1.

itakura_max_slope: float, defaults = None

Gradient of the slope for itakura parallelogram (if using Itakura Parallelogram lower bounding). Value must be between 0. and 1.

bounding_matrix: np.ndarray (2d of size mxn where m is len(x) and n is len(y)),

defaults = None

Custom bounding matrix to use. If defined then other lower_bounding params are ignored. The matrix should be structure so that indexes considered in bound should be the value 0. and indexes outside the bounding matrix should be infinity.

kwargs: Any

Extra kwargs.


Dtw distance between x and y.


If the sakoe_chiba_window_radius is not a float. If the itakura_max_slope is not a float. If the value of x or y provided is not a numpy array. If the value of x or y has more than 2 dimensions. If a metric string provided, and is not a defined valid string. If a metric object (instance of class) is provided and doesn’t inherit from NumbaDistance. If a resolved metric is not no_python compiled. If the metric type cannot be determined If both window and itakura_max_slope are set



H. Sakoe, S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26(1), pp. 43–49, 1978.


Ratanamahatana C and Keogh E.: Three myths about dynamic time warping data

mining Proceedings of 5th SIAM International Conference on Data Mining, 2005 .. [R93974dace1e5-3] Sakoe H. and Chiba S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26(1):43–49, 1978. .. [R93974dace1e5-4] Itakura F: Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 23( 1):67–72, 1975. .. [R93974dace1e5-5] Shokoohi-Yekta M et al.: Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Mining and Knowledge Discovery, 31, 1–31 (2017).


>>> import numpy as np
>>> from sktime.distances import dtw_distance
>>> x_1d = np.array([1, 2, 3, 4])  # 1d array
>>> y_1d = np.array([5, 6, 7, 8])  # 1d array
>>> dtw_distance(x_1d, y_1d)
>>> x_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])  # 2d array
>>> y_2d = np.array([[9, 10, 11, 12], [13, 14, 15, 16]])  # 2d array
>>> dtw_distance(x_2d, y_2d)