sktime testing framework overview#

sktime uses pytest for testing interface compliance of estimators, and correctness of code. This page gives an overview over the tests, and introductions on how to add tests, or how to extend the testing framework.

Test module architecture#

sktime testing happens on three layers, roughly corresponding to the inheritance layers of estimators.

  • “package level”: testing interface compliance with the BaseObject and BaseEstimator specifications, in tests/

  • “module level”: testing interface compliance of concrete estimators with their scitype base class, for instance forecasting/tests/

  • “low level”: testing individual functionality of estimators or other code, in individual files in tests folders.

Module conventions are as follows:

  • Each module contains a tests folder, which contains tests specific to that module.

  • Sub-modules may also contain tests folders.

  • tests folders may contain files to collect test configuration settings for that module

  • generic utilities for tests are located in the module utils._testing.

  • Tests for these utilities should be contained in the utils._testing.tests folder.

  • Each test module corresponding to a learning task and estimator scitype should containmodule level tests in a test test_all_[name_of_scitype].py file that tests interface compliance of all estimators adhering to the scitype. For instance, forecasting/tests/, or distances/tests/

  • Learning task specific tests should not duplicate package level, generic estimator tests in

Test code architecture#

sktime test files should use best pytest practice such as fixtures or test parameterization where possible, instead of custom logic, see pytest documentation.

Estimator tests use sktime’s framework plug-in to pytest_generate_tests, which parameterizes estimator fixtures and data input scenarios.

An illustrative example#

Starting with an example:

def test_fit_returns_self(estimator_instance, scenario):
   """Check that fit returns self."""
   fit_return =, method_sequence=["fit"])
   assert (
      fit_return is estimator_instance
   ), f"Estimator: {estimator_instance} does not return self when calling fit"

This test constitutes a loop over estimator_instance and scenario fixtures, where the loop is orchestrated by pytest parameterization in pytest_generate_tests, which automatically decorates the test with a suitable loop. Notably, loops in the test do not need to be written by the developer, if they use a fixture name (such as estimator_instance) which already has a loop defined. See below for more details, or the pytest documentation on the topic.

The sktime plug-in for pytest generates the tuples of fixture values for this. In the above example, we loop over the following fixtures lists:

  • estimator_instance over estimator instances, obtained from all sktime estimators via create_test_instances_and_names

  • scenario objects, which encodes data inputs and method call sequences to estimator_instance (explained in further detail below).

The sktime plug-in ensures that only those scenarios are retrieved that are applicable to the estimator_instance.

In the example, the command is equivalent to calling**scenario_kwargs), where the scenario_kwargs are generated by the scenario.

It should be noted that the test is not decorated with fixture parametrization, the fixtures are instead generated by pytest_generate_tests.

The reason for this is that the applicable scenarios (fixture values of scenario) depend on the estimator_instance fixture, since inputs to fit of a classifier will differ to an input to fit of a forecaster.

Parameterized fixtures#

sktime uses pytest fixture parameterization to execute tests in a loop over fixtures, for instance running all interface compatibility tests for all estimators. See the pytest documentation on fixture parameterization in general for an explanation of fixture parameterization.

Implementation-wise, loops over fixtures is orchestrated by pytest parameterization in pytest_generate_tests, which automatically decorates every test by a mark.parameterize based on the test arguments (estimator_instance and scenario in the above example). This is in line with standard use of pytest_generate_tests, see the section in the pytest documentation on advanced fixture parameterization using pytest_generate_tests.

Currently, the sktime testing framework provides automated fixture parameterization via mark.parameterize for the following fixtures, in module level tests:

  • estimator: all estimator classes, inheriting from the base class of the given module.

  • In the package level tests test_all_estimators, that base class is BaseEstimator.

  • estimator_instance: all estimator test instances, obtained from all sktime estimators via create_test_instances_and_names

  • scenario: test scenarios, applicable to estimator or estimator_instance.

  • The scenarios are specified in utils/_testing/scenarios_[estimator_scitype].

Further parameterization may happen for individual tests, the scope is usually explained in the test docstrings.


The scenario fixtures contain arguments for method calls, and a sequence for method calls.

An example scenario specification, from utils/_testing/scenarios_forecasting:

class ForecasterFitPredictUnivariateNoXLateFh(ForecasterTestScenario):
   """Fit/predict only, univariate y, no X, no fh in predict."""

   _tags = {"univariate_y": True, "fh_passed_in_fit": False}

   args = {
      "fit": {"y": _make_series(n_timepoints=20, random_state=RAND_SEED)},
      "predict": {"fh": 1},
   default_method_sequence = ["fit", "predict"]

The scenario ForecasterFitPredictUnivariateNoXLateFh encodes instructions applied to an estimator_instance, via instances scenario. A call result = will:

  1. first, call, random_state=RAND_SEED))

  2. then, call estimator_instance.predict(fh=1) and return the output too result.

The abstraction of “scenario” allows to specify multiple argument combinations across multiple methods.

The method run also has arguments (method_sequence and arg_sequence) that allow to override the method sequence, e.g., run them in a different order, or only a subset thereof.

Scenarios also provide a method scenario.is_applicable(estimator), which returns a boolean, whether scenario is applicable to estimator. For instance, scenarios with univariate data are not applicable to multivariate forecasters, and will cause exceptions in a fit method call. Non-applicable scenarios can be filtered out in positive tests, and filtered in in negative tests. As a default, the sktime implemented pytest_generate_tests only pass applicable scenarios.

Further, scenarios inherit from BaseObject, which allows to use the sktime tag system with scenarios.

For further details on scenarios, inspect the docstring of BaseScenario.

Extending the testing module#

This section explains how to extend the testing module. Depending on the primary change that is tested, the changes to the testing module will be shallow or deep. In decreasing order of commonality:

  • When adding new estimators or utility functionality, write low level tests that check correctness of the estimator.

  • These typically use only the simplest idioms in pytest (e.g., fixture parameterization).

  • New estimators are also automatically discovered and looped over by the existing module and package level tests.

  • Introducing or changing base class level interface points will typically require addition of module level tests, and addition of, or modification to scenarios with functionality specific to these interface points.Rarely, this may require changes package level tests.

  • Major interface changes or addition of modules may require writing of entire test suites, and changes or additions to package level tests.

Adding low level tests#

Low level tests are “free-form” and should follow best pytest practice. pytest tests should be located in the appropriate tests folder of the module where a change is made. Examples should be located in the docstring of the class or function added.

For an added estimator of name estimator_name, the test file should be called

Useful functionality to write tests:

  • example fixture generation, via datatypes.get_examples

  • data format checkers in datatypes: check_is_mtype, check_is_scitype, check_raise

  • miscellaneous utilities in utils, especially in _testing

Escaping tests#

On occasion, it may make sense to escape individual estimators from individual tests.

This can be done (currently, as of 0.9.0) in two ways:

  • adding the estimator or test/estimator combination to the EXCLUDED_TESTS or EXCLUDE_ESTIMATORS in the appropriate _config file.

  • adding a check condition in the is_excluded method used in pytest_generate_fixtures, possibly only if the testing module supports this

Escaping tests directly in the tests, e.g., via if isinstance(estimator_instance, MyClass) should be avoided where possible.

Adding package or module level tests#

Module level tests use pytest_generate_tests to define fixtures.

The available fixtures vary per module, and are listed in the docstring of pytest_generate_tests.

A new test should use these fixtures, if possible, but also can add new fixtures via pytest basic fixture functionality.

If new fixture variables are to be used throughout the module, or depend on existing fixtures, instructions in the next section should be followed.

Where possible, scenarios should be used to simulate generic method calls (see above), instead of creating and passing arguments directly. Scenarios will ensure consistent coverage of input argument cases.

Adding fixture variables#

One-off fixture variables (localized to one or a few tests) should be added using pytest basic functionality, such as immutable constants, pytest.fixture or pytest.mark.parameterize. Extending pytest_generate_tests can also be considered in this case, if it makes the tests more (and not less) readable.

In contrast, fixtures used throughout module or package level tests should typically be added to the fixture generation process called by pytest_generate_tests.

This requires:

  • adding a function _generate_[variablename](test_name, **kwargs), as described below

  • assigning the function to generator_dict["variablename"]

  • adding the new variable in the fixture_sequence list in pytest_generate_tests

The function _generate_[variable_name](test_name, **kwargs) should return two objects:

  • a list of fixture to loop over, to substitute for variable_name when appearing in a test signature

  • a list of names of equal length, i-th element used as a name for the i-th fixture in test logs

The function has access to:

  • test_name, the name of the test the variable is called in.

This can be used to customize the list of fixtures for specific tests, although this is meant for generic behaviour mainly. One-off escapes and similar should be avoided here, and instead dealt with xfail and similar.

  • the value of the fixture variables that appear earlier in fixture_sequence, in kwargs.

For instance, the value of estimator_instance, if this is a variable used in the test. This can be used to make the list of fixtures for variable_name dependent on the value of other fixtures variables

Adding or extending scenarios#

Scenarios can be added or modified if a new combination of method/input values should be tested. The two main options are:

  • adding a new scenario, similar to existing scenarios for an estimator scitype. This is the common case when a new input condition should be covered.

  • adding a method or argument key to existing scenarios. This is the common case when a new method or method sequence should be covered. For this, args cshould be added to the scenarios’ args key of an existing scenario.

Scenarios for a specific estimator scitype are found in utils/_testing/scenarios_[estimator_scitype]. All scenarios inherit from a base class for that scitype, e.g., ForecasterTestScenario. This base class defines generics such as is_applicable, or tag handling, for all scenarios of the same type.

Scenarios should usually define:

  • an args parameter: a dictionary, with arbitrary keys (usually names of methods).

  • The args parameter may be set as a class variable, or set by the contructor.

  • optionally, a default_method_sequence and a default_arg_sequence, lists of strings. These define the sequence in which methods are called, with which argument set, if run is called. Both may be class variables, or object variable set in the constructor.

  • side note: a method_sequence and arg_sequence can also be specified in run. If not passed, defaulting will take place (first to each other, then to the detault_etc variables)

  • optionally, a _tags dictionary, which is a BaseObject tags dictionary and behaves exactly like that of estimators.

  • optionally, a get_args method which allows to override key retrieval from args. For instance, to specify rules such as “if the key starts with predict_, always return …”

  • optionally, an is_applicable method which allows to compare the scenario with estimators. For instance, comparing whether both scenario and estimator are multivariate.

For further details and expected signature, consult the docstring of TestScenario (link), and/or inspect any of the scenarios base classes, e.g., ForecasterTestScenario.

Creating tests for a new estimator type#

If a module for a new estimator type is added, multiple things need to be created for module level tests:

  • scenarios to cover the specified base class interface behaviour, in utils/_testing/scenarios_[estimator_scitype]. This can be modelled on utils/_testing/scenarios_forecasting, or the other scenarios files.

  • a line in the dispatch dictionary in utils/_testing/scenarios_getter which links the scenarios to the scenario retrieval function, e.g., scenarios["forecaster"] = scenarios_forecasting

  • a tests/test_all_[estimator_scitype].py, from the root of the module.

  • in this file, appropriate fixture generation via pytest_generate_fixtures. This can be modelled off test_all_estimators or test_all_forecasters.

  • and, a collection of tests for interface compliance with the base class of the estimator type. The tests should cover positive cases, as well as testing raising of informative error message in negative cases.