GreedyGaussianSegmentation#

class GreedyGaussianSegmentation(k_max: int = 10, lamb: float = 1.0, max_shuffles: int = 250, verbose: bool = False, random_state: Optional[int] = None)[source]#

Greedy Gaussian Segmentation Estimator.

The method approxmates solutions for the problem of breaking a multivariate time series into segments, where the data in each segment could be modeled as independent samples from a multivariate Gaussian distribution. It uses a dynamic programming search algorithm with a heuristic that allows finding approximate solution in linear time with respect to the data length and always yields locally optimal choice.

Greedy Gaussian Segmentation (GGS) fits a segmented gaussian model (SGM) to the data by computing the approximate solution to the combinatorial problem of finding the approximate covariance-regularized maximum log-likelihood for fixed number of change points and a reagularization strength. It follows an interative procedure where a new breakpoint is added and then adjusting all breakpoints to (approximately) maximize the objective. It is similar to the top-down search used in other change point detection problems.

Parameters

k_max: int, default=10: Maximum number of change points to find. The number of segments is thus k+1.
lamb:float, default=1.0: Regularization parameter lambda (>= 0), which controls the amount of (inverse) covariance regularization, see Eq (1) in [1]. Regularization is introduced to reduce issues for high-dimensional problems. Setting lamb to zero will ignore regularization, whereas large values of lambda will favour simpler models.
max_shuffles: int, default=250: Maximum number of shuffles
verbose: bool, default=False: If True verbose output is enabled.
random_state: int or np.random.RandomState, default=None: Either random seed or an instance of np.random.RandomState

Attributes

change_points_: array_like, default=[]: Locations of change points as integer indexes. By convention change points include the identity segmentation, i.e. first and last index + 1 values.
_intermediate_change_points: List[List[int]], default=[]: Intermediate values of change points for each value of k = 1…k_max
_intermediate_ll: List[float], default=[]: Intermediate values for log-likelihood for each value of k = 1…k_max

Notes

Based on the work from [1].

source code adapted based on: https://github.com/cvxgrp/GGS
paper available at: https://stanford.edu/~boyd/papers/pdf/ggs.pdf

References

1(1,2): Hallac, D., Nystrup, P. & Boyd, S., “Greedy Gaussian segmentation of multivariate time series.”, Adv Data Anal Classif 13, 727–751 (2019). https://doi.org/10.1007/s11634-018-0335-0

Methods

`check_is_fitted`()	Check if the estimator has been fitted.
`clone`()	Obtain a clone of the object with same hyper-parameters.
`clone_tags`(estimator[, tag_names])	clone/mirror tags from another estimator as dynamic override.
`create_test_instance`([parameter_set])	Construct Estimator instance if possible.
`create_test_instances_and_names`([parameter_set])	Create list of all test instances and a list of names for them.
`fit`(X[, y])	Fit method for compatibility with sklearn-type estimator interface.
`fit_predict`(X[, y])	Perform segmentation.
`get_class_tag`(tag_name[, tag_value_default])	Get tag value from estimator class (only class tags).
`get_class_tags`()	Get class tags from estimator class and all its parent classes.
`get_fitted_params`()	Get fitted parameters.
`get_param_defaults`()	Get parameter defaults for the object.
`get_param_names`()	Get parameter names for the object.
`get_params`([deep])	Return initialization parameters.
`get_tag`(tag_name[, tag_value_default, …])	Get tag value from estimator class and dynamic tag overrides.
`get_tags`()	Get tags from estimator class and dynamic tag overrides.
`get_test_params`([parameter_set])	Return testing parameter settings for the estimator.
`is_composite`()	Check if the object is composite.
`load_from_path`(serial)	Load object from file location.
`load_from_serial`(serial)	Load object from serialized memory container.
`predict`(X[, y])	Perform segmentation.
`reset`()	Reset the object to a clean post-init state.
`save`([path])	Save serialized self to bytes-like object or to (.zip) file.
`set_params`(**parameters)	Set the parameters of this object.
`set_tags`(**tag_dict)	Set dynamic tags to given values.

fit(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None)[source]#

Fit method for compatibility with sklearn-type estimator interface.

It sets the internal state of the estimator and returns the initialized instance.

Parameters

X: array_like: 2D array_like representing time series with sequence index along the first dimension and value series as columns.
y: array_like: Placeholder for compatibility with sklearn-api, not used, default=None.

predict(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) → Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

Perform segmentation.

Parameters

X: array_like: 2D array_like representing time series with sequence index along the first dimension and value series as columns.
y: array_like: Placeholder for compatibility with sklearn-api, not used, default=None.

Returns

y_predarray_like: 1D array with predicted segmentation of the same size as the first dimension of X. The numerical values represent distinct segments labels for each of the data points.

fit_predict(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) → Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

Perform segmentation.

Parameters

X: array_like: 2D array_like representing time series with sequence index along the first dimension and value series as columns.
y: array_like: Placeholder for compatibility with sklearn-api, not used, default=None.

Returns

y_predarray_like: 1D array with predicted segmentation of the same size as the first dimension of X. The numerical values represent distinct segments labels for each of the data points.

get_params(deep: bool = True) → Dict[source]#

Return initialization parameters.

Parameters

deep: bool: Dummy argument for compatibility with sklearn-api, not used.

Returns

params: dict: Dictionary with the estimator’s initialization parameters, with keys being argument names and values being argument values.

set_params(**parameters)[source]#

Set the parameters of this object.

Parameters

parametersdict: Initialization parameters for th estimator.

Returns

selfreference to self (after parameters have been set)

check_is_fitted()[source]#

Check if the estimator has been fitted.

Raises

NotFittedError: If the estimator has not been fitted yet.

clone()[source]#

Obtain a clone of the object with same hyper-parameters.

A clone is a different object without shared references, in post-init state. This function is equivalent to returning sklearn.clone of self. Equal in value to type(self)(**self.get_params(deep=False)).

Returns

instance of type(self), clone of self (see above)

clone_tags(estimator, tag_names=None)[source]#

clone/mirror tags from another estimator as dynamic override.

Parameters

estimatorestimator inheriting from :class:BaseEstimator
tag_namesstr or list of str, default = None: Names of tags to clone. If None then all tags in estimator are used as tag_names.

Returns

Self: Reference to self.

Notes

Changes object state by setting tag values in tag_set from estimator as dynamic tags in self.

classmethod create_test_instance(parameter_set='default')[source]#

Construct Estimator instance if possible.

Parameters

parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns

instanceinstance of the class with default parameters

Notes

get_test_params can return dict or list of dict. This function takes first or single dict that get_test_params returns, and constructs the object with that.

classmethod create_test_instances_and_names(parameter_set='default')[source]#

Create list of all test instances and a list of names for them.

Parameters

parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns

objslist of instances of cls: i-th instance is cls(**cls.get_test_params()[i])
nameslist of str, same length as objs: i-th element is name of i-th instance of obj in tests convention is {cls.__name__}-{i} if more than one instance otherwise {cls.__name__}
parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

classmethod get_class_tag(tag_name, tag_value_default=None)[source]#

Get tag value from estimator class (only class tags).

Parameters

tag_namestr: Name of tag value.
tag_value_defaultany type: Default/fallback value if tag is not found.

Returns

tag_value: Value of the tag_name tag in self. If not found, returns tag_value_default.

classmethod get_class_tags()[source]#

Get class tags from estimator class and all its parent classes.

Returns

collected_tagsdict: Dictionary of tag name : tag value pairs. Collected from _tags class attribute via nested inheritance. NOT overridden by dynamic tags set by set_tags or mirror_tags.

get_fitted_params()[source]#

Get fitted parameters.

State required:: Requires state to be “fitted”.

Returns

fitted_paramsdict of fitted parameters, keys are str names of parameters: parameters of components are indexed as [componentname]__[paramname]

classmethod get_param_defaults()[source]#

Get parameter defaults for the object.

Returns

default_dict: dict with str keys: keys are all parameters of cls that have a default defined in __init__ values are the defaults, as defined in __init__

classmethod get_param_names()[source]#

Get parameter names for the object.

Returns

param_names: list of str, alphabetically sorted list of parameter names of cls

get_tag(tag_name, tag_value_default=None, raise_error=True)[source]#

Get tag value from estimator class and dynamic tag overrides.

Parameters

tag_namestr: Name of tag to be retrieved
tag_value_defaultany type, optional; default=None: Default/fallback value if tag is not found
raise_errorbool: whether a ValueError is raised when the tag is not found

Returns

tag_value: Value of the tag_name tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.

Raises

ValueError if raise_error is True i.e. if tag_name is not in self.get_tags(
).keys()

get_tags()[source]#

Get tags from estimator class and dynamic tag overrides.

Returns

collected_tagsdict: Dictionary of tag name : tag value pairs. Collected from _tags class attribute via nested inheritance and then any overrides and new tags from _tags_dynamic object attribute.

classmethod get_test_params(parameter_set='default')[source]#

Return testing parameter settings for the estimator.

Parameters

parameter_setstr, default=”default”: Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.

Returns

paramsdict or list of dict, default = {}: Parameters to create testing instances of the class Each dict are parameters to construct an “interesting” test instance, i.e., MyClass(**params) or MyClass(**params[i]) creates a valid test instance. create_test_instance uses the first (or only) dictionary in params

is_composite()[source]#

Check if the object is composite.

A composite object is an object which contains objects, as parameters. Called on an instance, since this may differ by instance.

Returns

composite: bool, whether self contains a parameter which is BaseObject

property is_fitted[source]#: Whether fit has been called.

classmethod load_from_path(serial)[source]#

Load object from file location.

Parameters

serialresult of ZipFile(path).open(“object)

Returns

deserialized self resulting in output at path, of cls.save(path)

classmethod load_from_serial(serial)[source]#

Load object from serialized memory container.

Parameters

serial1st element of output of cls.save(None)

Returns

deserialized self resulting in output serial, of cls.save(None)

reset()[source]#

Reset the object to a clean post-init state.

Equivalent to sklearn.clone but overwrites self. After self.reset() call, self is equal in value to type(self)(**self.get_params(deep=False))

Detail behaviour: removes any object attributes, except:

hyper-parameters = arguments of __init__ object attributes containing double-underscores, i.e., the string “__”

runs __init__ with current values of hyper-parameters (result of get_params)

Not affected by the reset are: object attributes containing double-underscores class and object methods, class attributes

save(path=None)[source]#

Save serialized self to bytes-like object or to (.zip) file.

Behaviour: if path is None, returns an in-memory serialized self if path is a file location, stores self at that location as a zip file

saved files are zip files with following contents: _metadata - contains class of self, i.e., type(self) _obj - serialized self. This class uses the default serialization (pickle).

Parameters

pathNone or file location (str or Path): if None, self is saved to an in-memory object if file location, self is saved to that file location. If:

path=”estimator” then a zip file estimator.zip will be made at cwd. path=”/home/stored/estimator” then a zip file estimator.zip will be stored in /home/stored/.

Returns

if path is None - in-memory serialized self
if path is file location - ZipFile with reference to the file

set_tags(**tag_dict)[source]#

Set dynamic tags to given values.

Parameters

tag_dictdict: Dictionary of tag name : tag value pairs.

Returns

Self: Reference to self.

Notes

Changes object state by settting tag values in tag_dict as dynamic tags in self.