edr_distance#

edr_distance(x: ndarray, y: ndarray, window: float | None = None, itakura_max_slope: float | None = None, bounding_matrix: ndarray | None = None, epsilon: float = None, **kwargs: Any) float[source]#

Compute the Edit distance for real sequences (EDR) between two series.

EDR computes the minimum number of elements (as a percentage) that must be removed from x and y so that the sum of the distance between the remaining signal elements lies within the tolerance (epsilon). EDR was originally proposed in [1].

The value returned will be between 0 and 1 per time series. The value will represent as a percentage of elements that must be removed for the time series to be an exact match.

Parameters:
x: np.ndarray (1d or 2d array)

First time series.

y: np.ndarray (1d or 2d array)

Second time series.

window: float, defaults = None

Float that is the radius of the sakoe chiba window (if using Sakoe-Chiba lower bounding). Value must be between 0. and 1.

itakura_max_slope: float, defaults = None

Gradient of the slope for itakura parallelogram (if using Itakura Parallelogram lower bounding). Value must be between 0. and 1.

bounding_matrix: np.ndarray (2d array), defaults = None

Custom bounding matrix to use. If defined then other lower_bounding params are ignored. The matrix should be structure so that indexes considered in bound should be the value 0. and indexes outside the bounding matrix should be infinity.

epsilonfloat, defaults = None

Matching threshold to determine if two subsequences are considered close enough to be considered ‘common’. If not specified as per the original paper epsilon is set to a quarter of the maximum standard deviation.

kwargs: Any

Extra kwargs.

Returns:
float

Edr distance between the x and y. The value will be between 0.0 and 1.0 where 0.0 is an exact match between time series (i.e. they are the same) and 1.0 where there are no matching subsequences.

Raises:
ValueError

If the sakoe_chiba_window_radius is not a float. If the itakura_max_slope is not a float. If the value of x or y provided is not a numpy array. If the value of x or y has more than 3 dimensions. If a metric string provided, and is not a defined valid string. If a metric object (instance of class) is provided and doesn’t inherit from NumbaDistance. If the metric type cannot be determined If both window and itakura_max_slope are set

References

[1]

Lei Chen, M. Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity

search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data (SIGMOD ‘05). Association for Computing Machinery, New York, NY, USA, 491–502. DOI:https://doi.org/10.1145/1066157.1066213

Examples

>>> import numpy as np
>>> from sktime.distances import edr_distance
>>> x_1d = np.array([1, 2, 3, 4])  # 1d array
>>> y_1d = np.array([5, 6, 7, 8])  # 1d array
>>> edr_distance(x_1d, y_1d)
1.0
>>> x_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])  # 2d array
>>> y_2d = np.array([[9, 10, 11, 12], [13, 14, 15, 16]])  # 2d array
>>> edr_distance(x_2d, y_2d)
1.0