RandIndex#

RandIndex(use_loc=True)[source]#

Segmentation Rand Index metric.

The Segmentation Rand Index (SRI) for comparing two segmentations is the integrated version of the Rand Index for clustering (RI). It measures the similarity between two segmentations of the same time series, where each segmentation is a sequence of non-overlapping segments.

It takes values between 0 and 1, where 1 indicates identical segmentations.

The segmentation Rand Index (SRI) is obtained from the clustering Rand Index (RI), in qualitative terms, by:

  • Interpreting segments as clusters along an index or time axis.

  • Counting pairwise “agreements” (same vs different) along that axis.

A mathematical definition follows.

For clarity, we define the clustering Rand Index (RI) first:

The Rand Index (RI) between two clusterings of n elements is defined as:

\[RI = (a + b) / (a + b + c + d),\]

where:

  • a = #pairs in the same cluster in both segmentations

  • b = #pairs in different clusters in both segmentations

  • c = #pairs in the same cluster in the first segmentation but different in the second

  • d = #pairs in different clusters in the first segmentation but same in the second

The segmentation Rand Index (SRI) is defined as follows:

For a time series starting at time index \(S \in \mathbb{R}\) and ending at time index \(T \in \mathbb{R}\), let

\(a: [S, T] \rightarrow \mathbb{N}\) be the first segmentation, and \(b: [S, T] \rightarrow \mathbb{N}\) be the second segmentation.

Let \(\mathbb{I}(s_1, s_2, t_1, t_2)\) be the indicator function that is 1 iff at least one of the following holds, and 0 otherwise:

  • \(s_1 = s_2\) and \(t_1 = t_2\)

  • \(s_1 \neq s_2\) and \(t_1 \neq t_2\)

Then the SRI is defined as:

\[SRI = \frac{1}{T-S} \int_S^T \int_S^T \mathbb{I}(a(s), a(t), b(s), b(t)) ds dt\]

By default, if X is provided, this metric computes distances in loc-based units by looking up the corresponding labels in X.index. Otherwise, or if use_loc=False, it uses iloc-based indexing.

If an existing “label” column is present in y_true/y_pred, those labels are used directly for matching; otherwise, segments are considered to be numbered consecutively, starting from 0.