This page describes how to implement
sktime compatible estimators, and how to ensure and test compatibility.
There are additional steps for estimators that are contributed to
sktime compatible estimator#
The high-level steps to implement
sktime compatible estimators are as follows:
identify the type of the estimator: forecaster, classifier, etc
copy the extension template for that kind of estimator to its intended location
complete the extension template
sktimetest suite and/or the
check_estimatorutility (see here)
if the test suite highlights bugs or issues, fix them and go to 4
For more guidance on how to implement your own estimator, see this tutorial at pydata on testing interface conformance.
What is my learning task?#
sktime is structured along modules encompassing specific learning tasks,
e.g., forecasting or time series classification.
For brevity, we define an estimator’s scientific type or “scitype” by the formal learning task that it solves.
For example, the scitype of an estimator that solves the forecasting task is “forecaster”.
The scitype of an estimator that solves the time series classification task is “time series classifier”.
Estimators for a given scitype should be located in the respective module.
The estimator scitypes also map onto the different extension templates found in
Usually, the scitype of a given estimator is directly determined by what the estimator does. This is also, often, explicitly signposted in publications related to the estimator. For instance, most textbooks mention ARIMA in the context of forecasting, so in that hypothetical situation it makeas sense to consider the “forecaster” template. Then, inspect the template and check whether the methods of the class map clearly onto routines of the estimator. If not, another template might be more appropriate.
The most common point of confusion here is between transformers and other estimator types, since transformers are often used as parts of algorithms of other type.
If unsure, feel free to post your question on one of
sktime’s social channels.
Don’t panic - it is not uncommon that academic publications are not clear about the type of an estimator,
and correct categorization may be difficult even to experts.
sktime extension templates?#
Extension templates are convenient “fill-in” templates for implementers of new estimators.
They fit into
sktime’s unified interface as follows:
for each scitype, there is a public user interface, defined by the respective base class. For instance,
predictinterfaces for forecasters. All forecasters will implement
predictthe same way, by inheritance from
BaseForecaster. The public interface follows the “strategy” object orientation pattern.
for each scitype, there is a private extender interface, defined by the extension contract in the extension template. For instance, the
forecaster.pyextension template for forecasters explains what to fill in for a concrete forecaster inheriting from
BaseForecaster. In most extension templates, users should implement private methods (“inner” methods), e.g.,
_predictfor forecasters. Boilerplate code rests within the public part of the interface, in
predict. The extender interface follows the “template” object orientation pattern.
Extenders familiar with
scikit-learn extension should note the following difference to
the public interface, e.g.,
predict, is never overridden in
sktime (concrete) estimators.
Implementation happens in the private, extender sided interface, e.g.,
This allows to avoid boilerplate replication, such as
check_X etc in
This also allows richer boilerplate, such as automated vectorization functionality or input conversion.
How to use
sktime extension templates#
To use the
sktime extension templates, copy them to the intended location of the estimator.
Inside the extension templates, necessary actions are marked with
The typical workflow goes through the extension template by searching for
todo, and carrying out
the action described next to the
Extension templates typically have the following
choosing name and parameters for the estimator
filling in the
__init__: writing parameters to
filling in docstrings of the module and the estimator. This is recommended as early as parameters have been settled on, it tends to be useful as a specification to follow in implementation.
filling in the tags for the estimator. Some tags are “capabilities”, i.e., what the estimator can do, e.g., dealing with nans. Other tags determine the format of inputs seen in the “inner” methods
_fitetc, these tags are usually called
X_inner_mtypeor similar. This is useful in case the inner functionality assumes
pandas.DataFrame, and helps avoid conversion boilerplate. The type strings can be found in
datatypes.MTYPE_REGISTER. For a tutorial on data type conventions, see
Filling in the “inner” methods, e.g.,
_predict. The docstrings and comments in the extension template should be followed here. The docstrings also describe the guarantees on the inputs to the “inner” methods, which are typically stronger than the guarantees on inputs to the public methods, and determined by values of tags that have been set. For instance, setting the tag
pd.DataFramefor a forecaster guarantees that the
_fitwill be a
pandas.DataFrame, complying with additional data container specifications in
sktime(e.g., index types).
filling in testing parameters in
get_test_params. The selection of parameters should cover major estimator internal case distinctions to achieve good coverage.
Some common caveats, also described in extension template text:
__init__parameters should be written to
selfand never be changed
special case of this: estimator components, i.e., parameters that are estimators, should generally be cloned (via
sklearn.clone), and method should be called only on the clones
methods should generally avoid side effects on arguments
non-state changing methods should not write to
set_paramsis not needed, since
set_paramsare typically needed only for complex cases only heterogeneous composites, e.g., pipelines with parameters that are nested structures containing estimators.
How to test interface conformance#
For a video tutorial and more examples on the below, please visit our tutorial at pydata.
Usually, the simplest way to test interface conformance with
sktime is via the
check_estimator methods in the
When invoked, this will collect tests in
sktime relevant for the estimator type and
run them on the estimator.
This can be used for manual debugging in a notebook environment.
Example of running the full test suite for
from sktime.utils.estimator_checks import check_estimator from sktime.forecasting.naive import NaiveForecaster check_estimator(NaiveForecaster)
check_estimator utility will return, by default, a
dict, indexed by test/fixture combination strings,
that is, a test name and the fixture combination string in squared brackets.
test_repr is the test name, and
NaiveForecaster-2 the fixture combination string.
Values of the return
dict are either the string
"PASSED", if the test succeeds, or the exception that the test would raise at failure.
check_estimator does not raise exceptions by default, the default is returning them as dictionary values.
To raise the exceptions instead, e.g., for debugging, use the argument
which will raise the exceptions instead of returning them as dictionary values.
In that case, there will be at most one exception raised, namely the first exception encountered in the test execution order.
To run or exclude certain tests, use the
Values provided should be names of tests (str), or a list of names of tests.
Note that test names exclude the part in squared brackets.
Example, running the test
test_constructor with all fixtures:
To run or exclude certain test-fixture-combinations, use the
Values provided should be names of test-fixture-combination strings (str), or a list of such.
Valid strings are precisely the dictionary keys when using
check_estimator with default parameters.
Example, running the test-fixture-combination
A useful workflow for using
check_estimator to debug an estimator is as follows:
check_estimator(MyEstimator)to find failing tests
Subset to failing tests or fixtures using
If the failure is not obvious, set
raise_exceptions=Trueto raise the exception and inspecet the traceback.
If the failure is still not clear, use advanced debuggers on the line of code with
Running the test suite in a repository clone#
If the target location of the estimator is within
sktime, then the
suite can be run instead. The
sktime test suite (and CI/CD) is
pytest will automatically
collect all estimators of a certain type and tests applying for a given estimator.
For an overview of the testing framework, see the “testing framework” documentation.
Generic interface conformance tests are contained in the classes
TestAllForecasters, and so on.
pytest test-fixture-strings for an estimator
EstimatorName will always contain
EstimatorName as a substring,
and are identical with the test-fixture-strings returned by
To run tests only for a given estimator from the console, the command
pytest -k "EstimatorName" can be used.
This will typically have the same effect as using
check_estimator(EstimatorName), only via direct
When using Visual Studio Code or pycharm, tests can also be sub-setted using GUI filter
functionality - for this, refer to the respecetive IDE documentation on test integration.
To identify codebase locations of tests applying to a specific estimator,
a quick approach is searching the codebase for test strings produced by
check_estimator, preceded by
def (for function/method definition).
Testing within a third party extension package#
For third party extension packages to
sktime (open or closed),
or third party modules that aim for interface compliance with
sktime test suite can be imported and extended in two ways:
check_estimator, this will carry out the tests defined in
importing test classes, e.g.,
test_all_forecasters.TestAllForecasters. The imports will be discovered directly by
pytest. The test suite also be extended by inheriting from the test classes.
sktime compatible estimator to
When adding an
sktime compatible estimator to
sktime itself, a number of
additional things need to be done:
ensure that code also meets
add the estimator to the
sktimeAPI reference. This is done by adding a reference to the estimator in the correct
authors of the estimator should add themselves to
CODEOWNERS, as owners of the contributed estimator.
if the estimator relies on soft dependencies, or adds new soft dependencies, the steps in the “dependencies” developer guide should be followed
ensure that the estimator passes the entire local test suite of
sktime, with the estimator in its target location. To run tests only for the estimator, the command
pytest -k "EstimatorName"can be used (or vs code GUI filter functionality)
ensure that test parameters in
get_test_paramsare chosen such that runtime of estimator specific tests remains in the seconds order on
Don’t panic - when contributing to
sktime, core developers will give helpful pointers on the above in their PR reviews.
It is recommended to open a draft PR to get feedback early.
Estimators dependent on cython#
To add an estimator to
sktime that depends on cython, the following additional steps are needed:
all cython code should be present in a separate package on
conda-forge. No cython dependent code should be added directly to
sktime. Below, we call this separate package
home-package, for simplicity of reference.
home-package, it is recommended to test the estimator via
check_estimator, on the same test matrix as
sktime: all supported python versions; MacOS, Linux, Windows.
sktime, an interface to the algorithm should be added. This can be a simple import from
home-package, if the algorithm in
Alternatively, the algorithm can be interfaced via a delegator as a delegate, tags and method overrides can be added in the delegator. See, e.g.,
requires_cythontag should be set to
True, and the
python_dependenciestag should be set to the string
If all has been setup correctly, the estimator will be tested in
sktime by the
Note that this CI element does not cover the full test matrix
of python version and operating systems, this should be done in the upstream package.