ColumnTransformer#
- class ColumnTransformer(transformers, remainder='drop', sparse_threshold=0.3, n_jobs=1, transformer_weights=None, preserve_dataframe=True)[source]#
Column-wise application of transformers.
Applies transformations to columns of an array or pandas DataFrame. Simply takes the column transformer from sklearn and adds capability to handle pandas dataframe.
This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.
- Parameters:
- transformerslist of tuples
List of (name, transformer, column(s)) tuples specifying the transformer objects to be applied to subsets of the data. name : string
Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using
set_paramsand searched in grid search.- transformerestimator or {“passthrough”, “drop”}
Estimator must support
fitandtransform. Special-cased strings “drop” and “passthrough” are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively.
column(s) : str or int, array-like of string or int, slice, boolean mask array or callable
Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where
transformerexpects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input dataXand can return any of the above.- remainder{“drop”, “passthrough”} or estimator, default “drop”
By default, only the specified columns in
transformationsare transformed and combined in the output, and the non-specified columns are dropped. (default of"drop"). By specifyingremainder="passthrough", all remaining columns that were not specified intransformationswill be automatically passed through. This subset of columns is concatenated with the output of the transformations. By settingremainderto be an estimator, the remaining non-specified columns will use theremainderestimator. The estimator must supportfitandtransform.- sparse_thresholdfloat, default = 0.3
If the output of the different transformations contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use
sparse_threshold=0to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.- n_jobsint or None, optional (default=None)
Number of jobs to run in parallel.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.- transformer_weightsdict, optional
Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.
- preserve_dataframeboolean
If True, pandas dataframe is returned. If False, numpy array is returned.
- Attributes:
- transformers_list
The collection of fitted transformations as tuples of (name, fitted_transformer, column).
fitted_transformercan be an estimator, “drop”, or “passthrough”. In case there were no columns selected, this will be the unfitted transformer. If there are remaining columns, the final element is a tuple of the form: (“remainder”, transformer, remaining_columns) corresponding to theremainderparameter. If there are remaining columns, thenlen(transformers_)==len(transformations)+1, otherwiselen(transformers_)==len(transformations).named_transformers_Bunch object, a dictionary with attribute accessAccess the fitted transformer by name.
- sparse_output_bool
Boolean flag indicating whether the output of
transformis a sparse matrix or a dense numpy array, which depends on the output of the individual transformations and thesparse_thresholdkeyword.
Methods
Check if the estimator has been fitted.
clone()Obtain a clone of the object with same hyper-parameters.
clone_tags(estimator[, tag_names])Clone tags from another estimator as dynamic override.
create_test_instance([parameter_set])Construct Estimator instance if possible.
create_test_instances_and_names([parameter_set])Create list of all test instances and a list of names for them.
fit(X[, y])Fit the transformer.
fit_transform(X[, y])Fit and transform, shorthand.
get_class_tag(tag_name[, tag_value_default])Get a class tag's value.
Get class tags from the class and all its parent classes.
Get config flags for self.
get_feature_names_out([input_features])Get output feature names for transformation.
get_fitted_params([deep])Get fitted parameters.
Get metadata routing of this object.
Get object's parameter defaults.
get_param_names([sort])Get object's parameter names.
get_params([deep])Get parameters for this estimator.
get_tag(tag_name[, tag_value_default, ...])Get tag value from estimator class and dynamic tag overrides.
get_tags()Get tags from estimator class and dynamic tag overrides.
Return testing parameter settings for the estimator.
inverse_transform(X[, y])Inverse transform X and return an inverse transformed version.
Check if the object is composed of other BaseObjects.
load_from_path(serial)Load object from file location.
load_from_serial(serial)Load object from serialized memory container.
reset()Reset the object to a clean post-init state.
save([path, serialization_format])Save serialized self to bytes-like object or to (.zip) file.
set_config(**config_dict)Set config flags to given values.
set_output(*[, transform])Set the output container when "transform" and "fit_transform" are called.
set_params(**kwargs)Set the parameters of this estimator.
set_random_state([random_state, deep, ...])Set random_state pseudo-random seed parameters for self.
set_tags(**tag_dict)Set dynamic tags to given values.
transform(X[, y])Transform the data.
update(X[, y, update_params])Update transformer with X, optionally y.
- classmethod get_test_params()[source]#
Return testing parameter settings for the estimator.
- Returns:
- paramsdict or list of dict, default = {}
Parameters to create testing instances of the class Each dict are parameters to construct an “interesting” test instance, i.e.,
MyClass(**params)orMyClass(**params[i])creates a valid test instance.create_test_instanceuses the first (or only) dictionary inparams
- check_is_fitted()[source]#
Check if the estimator has been fitted.
- Raises:
- NotFittedError
If the estimator has not been fitted yet.
- clone()[source]#
Obtain a clone of the object with same hyper-parameters.
A clone is a different object without shared references, in post-init state. This function is equivalent to returning sklearn.clone of self.
- Raises:
- RuntimeError if the clone is non-conforming, due to faulty
__init__.
- RuntimeError if the clone is non-conforming, due to faulty
Notes
If successful, equal in value to
type(self)(**self.get_params(deep=False)).
- clone_tags(estimator, tag_names=None)[source]#
Clone tags from another estimator as dynamic override.
- Parameters:
- estimatorestimator inheriting from :class:BaseEstimator
- tag_namesstr or list of str, default = None
Names of tags to clone. If None then all tags in estimator are used as tag_names.
- Returns:
- Self
Reference to self.
Notes
Changes object state by setting tag values in tag_set from estimator as dynamic tags in self.
- classmethod create_test_instance(parameter_set='default')[source]#
Construct Estimator instance if possible.
- Parameters:
- parameter_setstr, default=”default”
Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.
- Returns:
- instanceinstance of the class with default parameters
Notes
get_test_params can return dict or list of dict. This function takes first or single dict that get_test_params returns, and constructs the object with that.
- classmethod create_test_instances_and_names(parameter_set='default')[source]#
Create list of all test instances and a list of names for them.
- Parameters:
- parameter_setstr, default=”default”
Name of the set of test parameters to return, for use in tests. If no special parameters are defined for a value, will return “default” set.
- Returns:
- objslist of instances of cls
i-th instance is cls(**cls.get_test_params()[i])
- nameslist of str, same length as objs
i-th element is name of i-th instance of obj in tests convention is {cls.__name__}-{i} if more than one instance otherwise {cls.__name__}
- classmethod get_class_tag(tag_name, tag_value_default=None)[source]#
Get a class tag’s value.
Does not return information from dynamic tags (set via set_tags or clone_tags) that are defined on instances.
- Parameters:
- tag_namestr
Name of tag value.
- tag_value_defaultany
Default/fallback value if tag is not found.
- Returns:
- tag_value
Value of the tag_name tag in self. If not found, returns tag_value_default.
- classmethod get_class_tags()[source]#
Get class tags from the class and all its parent classes.
Retrieves tag: value pairs from _tags class attribute. Does not return information from dynamic tags (set via set_tags or clone_tags) that are defined on instances.
- Returns:
- collected_tagsdict
Dictionary of class tag name: tag value pairs. Collected from _tags class attribute via nested inheritance.
- get_config()[source]#
Get config flags for self.
- Returns:
- config_dictdict
Dictionary of config name : config value pairs. Collected from _config class attribute via nested inheritance and then any overrides and new tags from _onfig_dynamic object attribute.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
- Returns:
- feature_names_outndarray of str objects
Transformed feature names.
- get_fitted_params(deep=True)[source]#
Get fitted parameters.
- State required:
Requires state to be “fitted”.
- Parameters:
- deepbool, default=True
Whether to return fitted parameters of components.
If True, will return a dict of parameter name : value for this object, including fitted parameters of fittable components (= BaseEstimator-valued parameters).
If False, will return a dict of parameter name : value for this object, but not include fitted parameters of components.
- Returns:
- fitted_paramsdict with str-valued keys
Dictionary of fitted parameters, paramname : paramvalue keys-value pairs include:
always: all fitted parameters of this object, as via get_param_names values are fitted parameter value for that key, of this object
if deep=True, also contains keys/value pairs of component parameters parameters of components are indexed as [componentname]__[paramname] all parameters of componentname appear as paramname with its value
if deep=True, also contains arbitrary levels of component recursion, e.g., [componentname]__[componentcomponentname]__[paramname], etc
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Added in version 1.4.
- Returns:
- routingMetadataRouter
A
MetadataRouterencapsulating routing information.
- classmethod get_param_defaults()[source]#
Get object’s parameter defaults.
- Returns:
- default_dict: dict[str, Any]
Keys are all parameters of cls that have a default defined in __init__ values are the defaults, as defined in __init__.
- classmethod get_param_names(sort=True)[source]#
Get object’s parameter names.
- Parameters:
- sortbool, default=True
Whether to return the parameter names sorted in alphabetical order (True), or in the order they appear in the class
__init__(False).
- Returns:
- param_names: list[str]
List of parameter names of cls. If
sort=False, in same order as they appear in the class__init__. Ifsort=True, alphabetically ordered.
- get_params(deep=True)[source]#
Get parameters for this estimator.
Returns the parameters given in the constructor as well as the estimators contained within the transformers of the ColumnTransformer.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- get_tag(tag_name, tag_value_default=None, raise_error=True)[source]#
Get tag value from estimator class and dynamic tag overrides.
- Parameters:
- tag_namestr
Name of tag to be retrieved
- tag_value_defaultany type, optional; default=None
Default/fallback value if tag is not found
- raise_errorbool
whether a ValueError is raised when the tag is not found
- Returns:
- tag_valueAny
Value of the tag_name tag in self. If not found, returns an error if raise_error is True, otherwise it returns tag_value_default.
- Raises:
- ValueError if raise_error is True i.e. if tag_name is not in
- self.get_tags().keys()
- get_tags()[source]#
Get tags from estimator class and dynamic tag overrides.
- Returns:
- collected_tagsdict
Dictionary of tag name : tag value pairs. Collected from _tags class attribute via nested inheritance and then any overrides and new tags from _tags_dynamic object attribute.
- inverse_transform(X, y=None)[source]#
Inverse transform X and return an inverse transformed version.
- Currently it is assumed that only transformers with tags
“scitype:transform-input”=”Series”, “scitype:transform-output”=”Series”,
have an inverse_transform.
- State required:
Requires state to be “fitted”.
Accesses in self:
Fitted model attributes ending in “_”.
self.is_fitted, must be True
- Parameters:
- Xtime series in
sktimecompatible data container format Data to fit transform to.
Individual data formats in
sktimeare so-called mtype specifications, each mtype implements an abstract scitype.Seriesscitype = individual time series.pd.DataFrame,pd.Series, ornp.ndarray(1D or 2D)Panelscitype = collection of time series.pd.DataFramewith 2-level rowMultiIndex(instance, time),3D np.ndarray(instance, variable, time),listofSeriestypedpd.DataFrameHierarchicalscitype = hierarchical collection of time series.pd.DataFramewith 3 or more level rowMultiIndex(hierarchy_1, ..., hierarchy_n, time)
For further details on data format, see glossary on mtype. For usage, see transformer tutorial
examples/03_transformers.ipynb- yoptional, data in sktime compatible data format, default=None
Additional data, e.g., labels for transformation. Some transformers require this, see class docstring for details.
- Xtime series in
- Returns:
- inverse transformed version of X
of the same type as X, and conforming to mtype format specifications
- is_composite()[source]#
Check if the object is composed of other BaseObjects.
A composite object is an object which contains objects, as parameters. Called on an instance, since this may differ by instance.
- Returns:
- composite: bool
Whether an object has any parameters whose values are BaseObjects.
- classmethod load_from_path(serial)[source]#
Load object from file location.
- Parameters:
- serialresult of ZipFile(path).open(“object)
- Returns:
- deserialized self resulting in output at
path, ofcls.save(path)
- deserialized self resulting in output at
- classmethod load_from_serial(serial)[source]#
Load object from serialized memory container.
- Parameters:
- serial1st element of output of
cls.save(None)
- serial1st element of output of
- Returns:
- deserialized self resulting in output
serial, ofcls.save(None)
- deserialized self resulting in output
- property named_transformers_[source]#
Access the fitted transformer by name.
Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.
- reset()[source]#
Reset the object to a clean post-init state.
Using reset, runs __init__ with current values of hyper-parameters (result of get_params). This Removes any object attributes, except:
hyper-parameters = arguments of __init__
object attributes containing double-underscores, i.e., the string “__”
Class and object methods, and class attributes are also unaffected.
- Returns:
- self
Instance of class reset to a clean post-init state but retaining the current hyper-parameter values.
Notes
Equivalent to sklearn.clone but overwrites self. After self.reset() call, self is equal in value to type(self)(**self.get_params(deep=False))
- save(path=None, serialization_format='pickle')[source]#
Save serialized self to bytes-like object or to (.zip) file.
Behaviour: if
pathis None, returns an in-memory serialized self ifpathis a file location, stores self at that location as a zip filesaved files are zip files with following contents: _metadata - contains class of self, i.e., type(self) _obj - serialized self. This class uses the default serialization (pickle).
- Parameters:
- pathNone or file location (str or Path)
if None, self is saved to an in-memory object if file location, self is saved to that file location. If:
path=”estimator” then a zip file
estimator.zipwill be made at cwd. path=”/home/stored/estimator” then a zip fileestimator.zipwill be stored in/home/stored/.- serialization_format: str, default = “pickle”
Module to use for serialization. The available options are “pickle” and “cloudpickle”. Note that non-default formats might require installation of other soft dependencies.
- Returns:
- if
pathis None - in-memory serialized self - if
pathis file location - ZipFile with reference to the file
- if
- set_config(**config_dict)[source]#
Set config flags to given values.
- Parameters:
- config_dictdict
Dictionary of config name : config value pairs. Valid configs, values, and their meaning is listed below:
- displaystr, “diagram” (default), or “text”
how jupyter kernels display instances of self
“diagram” = html box diagram representation
“text” = string printout
- print_changed_onlybool, default=True
whether printing of self lists only self-parameters that differ from defaults (False), or all parameter names and values (False). Does not nest, i.e., only affects self and not component estimators.
- warningsstr, “on” (default), or “off”
whether to raise warnings, affects warnings from sktime only
“on” = will raise warnings from sktime
“off” = will not raise warnings from sktime
- backend:parallelstr, optional, default=”None”
backend to use for parallelization when broadcasting/vectorizing, one of
“None”: executes loop sequentally, simple list comprehension
“loky”, “multiprocessing” and “threading”: uses
joblib.Parallel“joblib”: custom and 3rd party
joblibbackends, e.g.,spark“dask”: uses
dask, requiresdaskpackage in environment
- backend:parallel:paramsdict, optional, default={} (no parameters passed)
additional parameters passed to the parallelization backend as config. Valid keys depend on the value of
backend:parallel:“None”: no additional parameters,
backend_paramsis ignored“loky”, “multiprocessing” and “threading”: default
joblibbackends any valid keys forjoblib.Parallelcan be passed here, e.g.,n_jobs, with the exception ofbackendwhich is directly controlled bybackend. Ifn_jobsis not passed, it will default to-1, other parameters will default tojoblibdefaults.“joblib”: custom and 3rd party
joblibbackends, e.g.,spark. Any valid keys forjoblib.Parallelcan be passed here, e.g.,n_jobs,backendmust be passed as a key ofbackend_paramsin this case. Ifn_jobsis not passed, it will default to-1, other parameters will default tojoblibdefaults.“dask”: any valid keys for
dask.computecan be passed, e.g.,scheduler
- input_conversionstr, one of “on” (default), “off”, or valid mtype string
controls input checks and conversions, for
_fit,_transform,_inverse_transform,_update"on"- input check and conversion is carried out"off"- input check and conversion are not carried out before passing data to inner methodsvalid mtype string - input is assumed to specified mtype, conversion is carried out but no check
- output_conversionstr, one of “on”, “off”, valid mtype string
controls output conversion for
_transform,_inverse_transform"on"- if input_conversion is “on”, output conversion is carried out"off"- output of_transform,_inverse_transformis directly returnedvalid mtype string - output is converted to specified mtype
- Returns:
- selfreference to self.
Notes
Changes object state, copies configs in config_dict to self._config_dynamic.
- set_output(*, transform=None)[source]#
Set the output container when “transform” and “fit_transform” are called.
Calling set_output will set the output of all estimators in transformers and transformers_.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**kwargs)[source]#
Set the parameters of this estimator.
Valid parameter keys can be listed with
get_params(). Note that you can directly set the parameters of the estimators contained in transformers of ColumnTransformer.- Parameters:
- **kwargsdict
Estimator parameters.
- Returns:
- selfColumnTransformer
This estimator.
- set_random_state(random_state=None, deep=True, self_policy='copy')[source]#
Set random_state pseudo-random seed parameters for self.
Finds
random_statenamed parameters viaestimator.get_params, and sets them to integers derived fromrandom_stateviaset_params. These integers are sampled from chain hashing viasample_dependent_seed, and guarantee pseudo-random independence of seeded random generators.Applies to
random_stateparameters inestimatordepending onself_policy, and remaining component estimators if and only ifdeep=True.Note: calls
set_paramseven ifselfdoes not have arandom_state, or none of the components have arandom_stateparameter. Therefore,set_random_statewill reset anyscikit-baseestimator, even those without arandom_stateparameter.- Parameters:
- random_stateint, RandomState instance or None, default=None
Pseudo-random number generator to control the generation of the random integers. Pass int for reproducible output across multiple function calls.
- deepbool, default=True
Whether to set the random state in sub-estimators. If False, will set only
self’srandom_stateparameter, if exists. If True, will setrandom_stateparameters in sub-estimators as well.- self_policystr, one of {“copy”, “keep”, “new”}, default=”copy”
“copy” :
estimator.random_stateis set to inputrandom_state“keep” :
estimator.random_stateis kept as is“new” :
estimator.random_stateis set to a new random state,
derived from input
random_state, and in general different from it
- Returns:
- selfreference to self
- set_tags(**tag_dict)[source]#
Set dynamic tags to given values.
- Parameters:
- **tag_dictdict
Dictionary of tag name: tag value pairs.
- Returns:
- Self
Reference to self.
Notes
Changes object state by setting tag values in tag_dict as dynamic tags in self.
- update(X, y=None, update_params=True)[source]#
Update transformer with X, optionally y.
- State required:
Requires state to be “fitted”.
Accesses in self:
Fitted model attributes ending in “_”.
self.is_fitted, must be True
Writes to self:
Fitted model attributes ending in “_”.
if
remember_datatag is True, writes toself._X, updated by values inX, viaupdate_data.
- Parameters:
- Xtime series in
sktimecompatible data container format Data to update transformation with
Individual data formats in
sktimeare so-called mtype specifications, each mtype implements an abstract scitype.Seriesscitype = individual time series.pd.DataFrame,pd.Series, ornp.ndarray(1D or 2D)Panelscitype = collection of time series.pd.DataFramewith 2-level rowMultiIndex(instance, time),3D np.ndarray(instance, variable, time),listofSeriestypedpd.DataFrameHierarchicalscitype = hierarchical collection of time series.pd.DataFramewith 3 or more level rowMultiIndex(hierarchy_1, ..., hierarchy_n, time)
For further details on data format, see glossary on mtype. For usage, see transformer tutorial
examples/03_transformers.ipynb- yoptional, data in sktime compatible data format, default=None
Additional data, e.g., labels for transformation. Some transformers require this, see class docstring for details.
- Xtime series in
- Returns:
- selfa fitted instance of the estimator