ScipyDist#
- class ScipyDist(metric='euclidean', p=2, colalign='intersect', var_weights=None, metric_kwargs=None)[source]#
Interface to scipy distances.
- computes pairwise distances using scipy.spatial.distance.cdist
- includes Euclidean distance and p-norm (Minkowski) distance
note: weighted distances are not supported
- Parameters:
- metricstring or function, as in cdist; default =
euclidean
- if string, one of:
braycurtis
,canberra
,chebyshev
,cityblock
, correlation
,cosine
,dice
,euclidean
,hamming
,jaccard
,jensenshannon
,kulsinski
(< scipy 1.11) orkulczynski1
(from scipy 1.11),mahalanobis
,matching
,minkowski
,rogerstanimoto
,russellrao
,seuclidean
,sokalmichener
,sokalsneath
,sqeuclidean
,yule
if function, should have signature 1D-np.array x 1D-np.array -> float
- if string, one of:
- p: if metric=``minkowski``, the ``p`` in ``p-norm``, otherwise irrelevant
- colalignstring, one of
intersect
(default),force-align
,none
controls column alignment if X, X2 passed in fit are pd.DataFrame columns between X and X2 are aligned via column names.
- if
intersect
, distance is computed on columns occurring both in X and X2, other columns are discarded; column ordering in X2 is copied from X
- if
force-align
, raises an error if the set of columns in X, X2 differs; column ordering in X2 is copied from X
- if
none
, X and X2 are passed through unmodified (no columns are aligned) note: this will potentially align “non-matching” columns
- if
- var_weights1D np.array of float or None, default=None
weight/scaling vector applied to variables in X/X2 before being passed to cdist, i-th col of X/X2 is multiplied by var_weights[i] if None, equivalent to all-ones vector
- metric_kwargsdict, optional, default=None
any kwargs passed to the metric in addition, i.e., to the function cdist common kwargs:
w
: array-like, same length as X.columns, weights for metric refer to scipy.spatial.distance.dist for a documentation of other extra kwargs
- metricstring or function, as in cdist; default =
- Attributes:
is_fitted
Whether
fit
has been called.
Methods
__call__
(X[, X2])Compute distance/kernel matrix, call shorthand.
check_is_fitted
([method_name])Check if the estimator has been fitted.
clone
()Obtain a clone of the object with same hyper-parameters and config.
clone_tags
(estimator[, tag_names])Clone tags from another object as dynamic override.
create_test_instance
([parameter_set])Construct an instance of the class, using first test parameter set.
create_test_instances_and_names
([parameter_set])Create list of all test instances and a list of names for them.
fit
([X, X2])Fit method for interface compatibility (no logic inside).
get_class_tag
(tag_name[, tag_value_default])Get class tag value from class, with tag level inheritance from parents.
Get class tags from class, with tag level inheritance from parent classes.
Get config flags for self.
get_fitted_params
([deep])Get fitted parameters.
Get object's parameter defaults.
get_param_names
([sort])Get object's parameter names.
get_params
([deep])Get a dict of parameters values for this object.
get_tag
(tag_name[, tag_value_default, ...])Get tag value from instance, with tag level inheritance and overrides.
get_tags
()Get tags from instance, with tag level inheritance and overrides.
get_test_params
([parameter_set])Return testing parameter settings for the estimator.
Check if the object is composed of other BaseObjects.
load_from_path
(serial)Load object from file location.
load_from_serial
(serial)Load object from serialized memory container.
reset
()Reset the object to a clean post-init state.
save
([path, serialization_format])Save serialized self to bytes-like object or to (.zip) file.
set_config
(**config_dict)Set config flags to given values.
set_params
(**params)Set the parameters of this object.
set_random_state
([random_state, deep, ...])Set random_state pseudo-random seed parameters for self.
set_tags
(**tag_dict)Set instance level tag overrides to given values.
transform
(X[, X2])Compute distance/kernel matrix.
- classmethod get_test_params(parameter_set='default')[source]#
Return testing parameter settings for the estimator.
- Parameters:
- parameter_setstr, default=”default”
Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return
"default"
set. There are currently no reserved values for distance/kernel transformers.
- Returns:
- paramsdict or list of dict, default = {}
Parameters to create testing instances of the class Each dict are parameters to construct an “interesting” test instance, i.e.,
MyClass(**params)
orMyClass(**params[i])
creates a valid test instance.create_test_instance
uses the first (or only) dictionary inparams
- check_is_fitted(method_name=None)[source]#
Check if the estimator has been fitted.
Check if
_is_fitted
attribute is present andTrue
. Theis_fitted
attribute should be set toTrue
in calls to an object’sfit
method.If not, raises a
NotFittedError
.- Parameters:
- method_namestr, optional
Name of the method that called this function. If provided, the error message will include this information.
- Raises:
- NotFittedError
If the estimator has not been fitted yet.
- clone()[source]#
Obtain a clone of the object with same hyper-parameters and config.
A clone is a different object without shared references, in post-init state. This function is equivalent to returning
sklearn.clone
ofself
.Equivalent to constructing a new instance of
type(self)
, with parameters ofself
, that is,type(self)(**self.get_params(deep=False))
.If configs were set on
self
, the clone will also have the same configs as the original, equivalent to callingcloned_self.set_config(**self.get_config())
.Also equivalent in value to a call of
self.reset
, with the exception thatclone
returns a new object, instead of mutatingself
likereset
.- Raises:
- RuntimeError if the clone is non-conforming, due to faulty
__init__
.
- RuntimeError if the clone is non-conforming, due to faulty
- clone_tags(estimator, tag_names=None)[source]#
Clone tags from another object as dynamic override.
Every
scikit-base
compatible object has a dictionary of tags. Tags may be used to store metadata about the object, or to control behaviour of the object.Tags are key-value pairs specific to an instance
self
, they are static flags that are not changed after construction of the object.clone_tags
sets dynamic tag overrides from another object,estimator
.The
clone_tags
method should be called only in the__init__
method of an object, during construction, or directly after construction via__init__
.The dynamic tags are set to the values of the tags in
estimator
, with the names specified intag_names
.The default of
tag_names
writes all tags fromestimator
toself
.Current tag values can be inspected by
get_tags
orget_tag
.- Parameters:
- estimatorAn instance of :class:BaseObject or derived class
- tag_namesstr or list of str, default = None
Names of tags to clone. The default (
None
) clones all tags fromestimator
.
- Returns:
- self
Reference to
self
.
- classmethod create_test_instance(parameter_set='default')[source]#
Construct an instance of the class, using first test parameter set.
- Parameters:
- parameter_setstr, default=”default”
Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.
- Returns:
- instanceinstance of the class with default parameters
- classmethod create_test_instances_and_names(parameter_set='default')[source]#
Create list of all test instances and a list of names for them.
- Parameters:
- parameter_setstr, default=”default”
Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.
- Returns:
- objslist of instances of cls
i-th instance is
cls(**cls.get_test_params()[i])
- nameslist of str, same length as objs
i-th element is name of i-th instance of obj in tests. The naming convention is
{cls.__name__}-{i}
if more than one instance, otherwise{cls.__name__}
- classmethod get_class_tag(tag_name, tag_value_default=None)[source]#
Get class tag value from class, with tag level inheritance from parents.
Every
scikit-base
compatible object has a dictionary of tags. Tags may be used to store metadata about the object, or to control behaviour of the object.Tags are key-value pairs specific to an instance
self
, they are static flags that are not changed after construction of the object.The
get_class_tag
method is a class method, and retrieves the value of a tag taking into account only class-level tag values and overrides.It returns the value of the tag with name
tag_name
from the object, taking into account tag overrides, in the following order of descending priority:Tags set in the
_tags
attribute of the class.Tags set in the
_tags
attribute of parent classes,
in order of inheritance.
Does not take into account dynamic tag overrides on instances, set via
set_tags
orclone_tags
, that are defined on instances.To retrieve tag values with potential instance overrides, use the
get_tag
method instead.- Parameters:
- tag_namestr
Name of tag value.
- tag_value_defaultany type
Default/fallback value if tag is not found.
- Returns:
- tag_value
Value of the
tag_name
tag inself
. If not found, returnstag_value_default
.
- classmethod get_class_tags()[source]#
Get class tags from class, with tag level inheritance from parent classes.
Every
scikit-base
compatible object has a dictionary of tags. Tags may be used to store metadata about the object, or to control behaviour of the object.Tags are key-value pairs specific to an instance
self
, they are static flags that are not changed after construction of the object.The
get_class_tags
method is a class method, and retrieves the value of a tag taking into account only class-level tag values and overrides.It returns a dictionary with keys being keys of any attribute of
_tags
set in the class or any of its parent classes.Values are the corresponding tag values, with overrides in the following order of descending priority:
Tags set in the
_tags
attribute of the class.Tags set in the
_tags
attribute of parent classes,
in order of inheritance.
Instances can override these tags depending on hyper-parameters.
To retrieve tags with potential instance overrides, use the
get_tags
method instead.Does not take into account dynamic tag overrides on instances, set via
set_tags
orclone_tags
, that are defined on instances.For including overrides from dynamic tags, use
get_tags
.- Returns:
- collected_tagsdict
Dictionary of tag name : tag value pairs. Collected from
_tags
class attribute via nested inheritance. NOT overridden by dynamic tags set byset_tags
orclone_tags
.
- get_config()[source]#
Get config flags for self.
Configs are key-value pairs of
self
, typically used as transient flags for controlling behaviour.get_config
returns dynamic configs, which override the default configs.Default configs are set in the class attribute
_config
of the class or its parent classes, and are overridden by dynamic configs set viaset_config
.Configs are retained under
clone
orreset
calls.- Returns:
- config_dictdict
Dictionary of config name : config value pairs. Collected from _config class attribute via nested inheritance and then any overrides and new tags from _onfig_dynamic object attribute.
- get_fitted_params(deep=True)[source]#
Get fitted parameters.
- State required:
Requires state to be “fitted”.
- Parameters:
- deepbool, default=True
Whether to return fitted parameters of components.
If True, will return a dict of parameter name : value for this object, including fitted parameters of fittable components (= BaseEstimator-valued parameters).
If False, will return a dict of parameter name : value for this object, but not include fitted parameters of components.
- Returns:
- fitted_paramsdict with str-valued keys
Dictionary of fitted parameters, paramname : paramvalue keys-value pairs include:
always: all fitted parameters of this object, as via
get_param_names
values are fitted parameter value for that key, of this objectif
deep=True
, also contains keys/value pairs of component parameters parameters of components are indexed as[componentname]__[paramname]
all parameters ofcomponentname
appear asparamname
with its valueif
deep=True
, also contains arbitrary levels of component recursion, e.g.,[componentname]__[componentcomponentname]__[paramname]
, etc
- classmethod get_param_defaults()[source]#
Get object’s parameter defaults.
- Returns:
- default_dict: dict[str, Any]
Keys are all parameters of
cls
that have a default defined in__init__
. Values are the defaults, as defined in__init__
.
- classmethod get_param_names(sort=True)[source]#
Get object’s parameter names.
- Parameters:
- sortbool, default=True
Whether to return the parameter names sorted in alphabetical order (True), or in the order they appear in the class
__init__
(False).
- Returns:
- param_names: list[str]
List of parameter names of
cls
. Ifsort=False
, in same order as they appear in the class__init__
. Ifsort=True
, alphabetically ordered.
- get_params(deep=True)[source]#
Get a dict of parameters values for this object.
- Parameters:
- deepbool, default=True
Whether to return parameters of components.
If
True
, will return adict
of parameter name : value for this object, including parameters of components (=BaseObject
-valued parameters).If
False
, will return adict
of parameter name : value for this object, but not include parameters of components.
- Returns:
- paramsdict with str-valued keys
Dictionary of parameters, paramname : paramvalue keys-value pairs include:
always: all parameters of this object, as via
get_param_names
values are parameter value for that key, of this object values are always identical to values passed at constructionif
deep=True
, also contains keys/value pairs of component parameters parameters of components are indexed as[componentname]__[paramname]
all parameters ofcomponentname
appear asparamname
with its valueif
deep=True
, also contains arbitrary levels of component recursion, e.g.,[componentname]__[componentcomponentname]__[paramname]
, etc
- get_tag(tag_name, tag_value_default=None, raise_error=True)[source]#
Get tag value from instance, with tag level inheritance and overrides.
Every
scikit-base
compatible object has a dictionary of tags. Tags may be used to store metadata about the object, or to control behaviour of the object.Tags are key-value pairs specific to an instance
self
, they are static flags that are not changed after construction of the object.The
get_tag
method retrieves the value of a single tag with nametag_name
from the instance, taking into account tag overrides, in the following order of descending priority:Tags set via
set_tags
orclone_tags
on the instance,
at construction of the instance.
Tags set in the
_tags
attribute of the class.Tags set in the
_tags
attribute of parent classes,
in order of inheritance.
- Parameters:
- tag_namestr
Name of tag to be retrieved
- tag_value_defaultany type, optional; default=None
Default/fallback value if tag is not found
- raise_errorbool
whether a
ValueError
is raised when the tag is not found
- Returns:
- tag_valueAny
Value of the
tag_name
tag inself
. If not found, raises an error ifraise_error
is True, otherwise it returnstag_value_default
.
- Raises:
- ValueError, if
raise_error
isTrue
. The
ValueError
is then raised iftag_name
is not inself.get_tags().keys()
.
- ValueError, if
- get_tags()[source]#
Get tags from instance, with tag level inheritance and overrides.
Every
scikit-base
compatible object has a dictionary of tags. Tags may be used to store metadata about the object, or to control behaviour of the object.Tags are key-value pairs specific to an instance
self
, they are static flags that are not changed after construction of the object.The
get_tags
method returns a dictionary of tags, with keys being keys of any attribute of_tags
set in the class or any of its parent classes, or tags set viaset_tags
orclone_tags
.Values are the corresponding tag values, with overrides in the following order of descending priority:
Tags set via
set_tags
orclone_tags
on the instance,
at construction of the instance.
Tags set in the
_tags
attribute of the class.Tags set in the
_tags
attribute of parent classes,
in order of inheritance.
- Returns:
- collected_tagsdict
Dictionary of tag name : tag value pairs. Collected from
_tags
class attribute via nested inheritance and then any overrides and new tags from_tags_dynamic
object attribute.
- is_composite()[source]#
Check if the object is composed of other BaseObjects.
A composite object is an object which contains objects, as parameters. Called on an instance, since this may differ by instance.
- Returns:
- composite: bool
Whether an object has any parameters whose values are
BaseObject
descendant instances.
- property is_fitted[source]#
Whether
fit
has been called.Inspects object’s
_is_fitted` attribute that should initialize to ``False
during object construction, and be set to True in calls to an object’s fit method.- Returns:
- bool
Whether the estimator has been fit.
- classmethod load_from_path(serial)[source]#
Load object from file location.
- Parameters:
- serialresult of ZipFile(path).open(“object)
- Returns:
- deserialized self resulting in output at
path
, ofcls.save(path)
- deserialized self resulting in output at
- classmethod load_from_serial(serial)[source]#
Load object from serialized memory container.
- Parameters:
- serial1st element of output of
cls.save(None)
- serial1st element of output of
- Returns:
- deserialized self resulting in output
serial
, ofcls.save(None)
- deserialized self resulting in output
- reset()[source]#
Reset the object to a clean post-init state.
Results in setting
self
to the state it had directly after the constructor call, with the same hyper-parameters. Config values set byset_config
are also retained.A
reset
call deletes any object attributes, except:hyper-parameters = arguments of
__init__
written toself
, e.g.,self.paramname
whereparamname
is an argument of__init__
object attributes containing double-underscores, i.e., the string “__”. For instance, an attribute named “__myattr” is retained.
config attributes, configs are retained without change. That is, results of
get_config
before and afterreset
are equal.
Class and object methods, and class attributes are also unaffected.
Equivalent to
clone
, with the exception thatreset
mutatesself
instead of returning a new object.After a
self.reset()
call,self
is equal in value and state, to the object obtained after a constructor call``type(self)(**self.get_params(deep=False))``.- Returns:
- self
Instance of class reset to a clean post-init state but retaining the current hyper-parameter values.
- save(path=None, serialization_format='pickle')[source]#
Save serialized self to bytes-like object or to (.zip) file.
Behaviour: if
path
is None, returns an in-memory serialized self ifpath
is a file location, stores self at that location as a zip filesaved files are zip files with following contents: _metadata - contains class of self, i.e., type(self) _obj - serialized self. This class uses the default serialization (pickle).
- Parameters:
- pathNone or file location (str or Path)
if None, self is saved to an in-memory object if file location, self is saved to that file location. If:
path=”estimator” then a zip file
estimator.zip
will be made at cwd. path=”/home/stored/estimator” then a zip fileestimator.zip
will be stored in/home/stored/
.- serialization_format: str, default = “pickle”
Module to use for serialization. The available options are “pickle” and “cloudpickle”. Note that non-default formats might require installation of other soft dependencies.
- Returns:
- if
path
is None - in-memory serialized self - if
path
is file location - ZipFile with reference to the file
- if
- set_config(**config_dict)[source]#
Set config flags to given values.
- Parameters:
- config_dictdict
Dictionary of config name : config value pairs. Valid configs, values, and their meaning is listed below:
- displaystr, “diagram” (default), or “text”
how jupyter kernels display instances of self
“diagram” = html box diagram representation
“text” = string printout
- print_changed_onlybool, default=True
whether printing of self lists only self-parameters that differ from defaults (False), or all parameter names and values (False). Does not nest, i.e., only affects self and not component estimators.
- warningsstr, “on” (default), or “off”
whether to raise warnings, affects warnings from sktime only
“on” = will raise warnings from sktime
“off” = will not raise warnings from sktime
- backend:parallelstr, optional, default=”None”
backend to use for parallelization when broadcasting/vectorizing, one of
“None”: executes loop sequentally, simple list comprehension
“loky”, “multiprocessing” and “threading”: uses
joblib.Parallel
“joblib”: custom and 3rd party
joblib
backends, e.g.,spark
“dask”: uses
dask
, requiresdask
package in environment
- backend:parallel:paramsdict, optional, default={} (no parameters passed)
additional parameters passed to the parallelization backend as config. Valid keys depend on the value of
backend:parallel
:“None”: no additional parameters,
backend_params
is ignored“loky”, “multiprocessing” and “threading”: default
joblib
backends any valid keys forjoblib.Parallel
can be passed here, e.g.,n_jobs
, with the exception ofbackend
which is directly controlled bybackend
. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults.“joblib”: custom and 3rd party
joblib
backends, e.g.,spark
. Any valid keys forjoblib.Parallel
can be passed here, e.g.,n_jobs
,backend
must be passed as a key ofbackend_params
in this case. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults.“dask”: any valid keys for
dask.compute
can be passed, e.g.,scheduler
- Returns:
- selfreference to self.
Notes
Changes object state, copies configs in config_dict to self._config_dynamic.
- set_params(**params)[source]#
Set the parameters of this object.
The method works on simple skbase objects as well as on composite objects. Parameter key strings
<component>__<parameter>
can be used for composites, i.e., objects that contain other objects, to access<parameter>
in the component<component>
. The string<parameter>
, without<component>__
, can also be used if this makes the reference unambiguous, e.g., there are no two parameters of components with the name<parameter>
.- Parameters:
- **paramsdict
BaseObject parameters, keys must be
<component>__<parameter>
strings.__
suffixes can alias full strings, if unique among get_params keys.
- Returns:
- selfreference to self (after parameters have been set)
- set_random_state(random_state=None, deep=True, self_policy='copy')[source]#
Set random_state pseudo-random seed parameters for self.
Finds
random_state
named parameters viaself.get_params
, and sets them to integers derived fromrandom_state
viaset_params
. These integers are sampled from chain hashing viasample_dependent_seed
, and guarantee pseudo-random independence of seeded random generators.Applies to
random_state
parameters inself
, depending onself_policy
, and remaining component objects if and only ifdeep=True
.Note: calls
set_params
even ifself
does not have arandom_state
, or none of the components have arandom_state
parameter. Therefore,set_random_state
will reset anyscikit-base
object, even those without arandom_state
parameter.- Parameters:
- random_stateint, RandomState instance or None, default=None
Pseudo-random number generator to control the generation of the random integers. Pass int for reproducible output across multiple function calls.
- deepbool, default=True
Whether to set the random state in skbase object valued parameters, i.e., component estimators.
If False, will set only
self
’srandom_state
parameter, if exists.If True, will set
random_state
parameters in component objects as well.
- self_policystr, one of {“copy”, “keep”, “new”}, default=”copy”
“copy” :
self.random_state
is set to inputrandom_state
“keep” :
self.random_state
is kept as is“new” :
self.random_state
is set to a new random state,
derived from input
random_state
, and in general different from it
- Returns:
- selfreference to self
- set_tags(**tag_dict)[source]#
Set instance level tag overrides to given values.
Every
scikit-base
compatible object has a dictionary of tags. Tags may be used to store metadata about the object, or to control behaviour of the object.Tags are key-value pairs specific to an instance
self
, they are static flags that are not changed after construction of the object.set_tags
sets dynamic tag overrides to the values as specified intag_dict
, with keys being the tag name, and dict values being the value to set the tag to.The
set_tags
method should be called only in the__init__
method of an object, during construction, or directly after construction via__init__
.Current tag values can be inspected by
get_tags
orget_tag
.- Parameters:
- **tag_dictdict
Dictionary of tag name: tag value pairs.
- Returns:
- Self
Reference to self.
- transform(X, X2=None)[source]#
Compute distance/kernel matrix.
- Behaviour: returns pairwise distance/kernel matrix
between samples in X and X2 (equal to X if not passed)
- Parameters:
- X: pd.DataFrame of length n, or 2D np.array with n rows
- X2: pd.DataFrame of length m, or 2D np.array with m rows, optional
default X2 = X
- Returns:
- distmat: np.array of shape [n, m]
(i,j)-th entry contains distance/kernel between X.iloc[i] and X2.iloc[j]