sktime
testing framework overview#
sktime
uses pytest
for testing interface compliance of estimators, and correctness of code.
This page gives an overview over the tests, and introductions on how to add tests, or how to extend the testing framework.
Test module architecture#
sktime
testing happens on three layers, roughly corresponding to the inheritance layers of estimators.
“package level”: testing interface compliance with the
BaseObject
andBaseEstimator
specifications, intests/test_all_estimators.py
“module level”: testing interface compliance of concrete estimators with their scitype base class, for instance
forecasting/tests/test_all_forecasters.py
“low level”: testing individual functionality of estimators or other code, in individual files in
tests
folders.
Module conventions are as follows:
Each module contains a
tests
folder, which contains tests specific to that module.Sub-modules may also contain
tests
folders.tests
folders may contain_config.py
files to collect test configuration settings for that modulegeneric utilities for tests are located in the module
utils._testing
.Tests for these utilities should be contained in the
utils._testing.tests
folder.Each test module corresponding to a learning task and estimator scitype should containmodule level tests in a test
test_all_[name_of_scitype].py
file that tests interface compliance of all estimators adhering to the scitype. For instance,forecasting/tests/test_all_forecasters.py
, ordistances/tests/test_all_dist_kernels.py
.Learning task specific tests should not duplicate package level, generic estimator tests in
test_all_estimators.py
Test code architecture#
sktime
test files should use best pytest
practice such as fixtures or test parameterization where possible,
instead of custom logic, see pytest documentation.
Estimator tests use sktime
’s framework plug-in to pytest_generate_tests
,
which parameterizes estimator fixtures and data input scenarios.
An illustrative example#
Starting with an example:
def test_fit_returns_self(estimator_instance, scenario):
"""Check that fit returns self."""
fit_return = scenario.run(estimator_instance, method_sequence=["fit"])
assert (
fit_return is estimator_instance
), f"Estimator: {estimator_instance} does not return self when calling fit"
This test constitutes a loop over estimator_instance
and scenario
fixtures,
where the loop is orchestrated by pytest
parameterization in
pytest_generate_tests
, which automatically decorates the test with a suitable loop.
Notably, loops in the test do not need to be written by the developer,
if they use a fixture name (such as estimator_instance
) which already has a loop defined.
See below for more details, or the pytest documentation on the topic.
The sktime
plug-in for pytest
generates the tuples of fixture values for this.
In the above example, we loop over the following fixtures lists:
estimator_instance
over estimator instances, obtained from allsktime
estimators viacreate_test_instances_and_names
, which constructs instances from parameter settings in estimator classes’get_test_params
.scenario
objects, which encodes data inputs and method call sequences toestimator_instance
(explained in further detail below).
The sktime
plug-in ensures that only those scenarios
are retrieved that are
applicable to the estimator_instance
.
In the example, the scenario.run
command is equivalent to calling estimator_instance.fit(**scenario_kwargs)
,
where the scenario_kwargs
are generated by the scenario
.
It should be noted that the test is not decorated with fixture parametrization,
the fixtures are instead generated by pytest_generate_tests
.
The reason for this is that the applicable scenarios (fixture values of scenario
) depend on the estimator_instance
fixture,
since inputs to fit
of a classifier will differ to an input to fit
of a forecaster.
Parameterized fixtures#
sktime
uses pytest
fixture parameterization to execute tests in a loop over fixtures,
for instance running all interface compatibility tests for all estimators.
See the pytest documentation on fixture parameterization in general for an explanation of fixture parameterization.
Implementation-wise, loops over fixtures is orchestrated by pytest
parameterization in
pytest_generate_tests
, which automatically decorates every test by
a mark.parameterize
based on the test arguments (estimator_instance
and scenario
in the above example).
This is in line with standard use of pytest_generate_tests
, see the section in the pytest
documentation on advanced fixture parameterization using pytest_generate_tests
.
Currently, the sktime
testing framework provides automated fixture parameterization
via mark.parameterize
for the following fixtures, in module level tests:
estimator
: all estimator classes, inheriting from the base class of the given module.In the package level tests
test_all_estimators
, that base class isBaseEstimator
.estimator_instance
: all estimator test instances, obtained from allsktime
estimators viacreate_test_instances_and_names
scenario
: test scenarios, applicable toestimator
orestimator_instance
.The scenarios are specified in
utils/_testing/scenarios_[estimator_scitype]
.
Further parameterization may happen for individual tests, the scope is usually explained in the test docstrings.
Scenarios#
The scenario
fixtures contain arguments for method calls, and a sequence for method calls.
An example scenario specification, from utils/_testing/scenarios_forecasting
:
class ForecasterFitPredictUnivariateNoXLateFh(ForecasterTestScenario):
"""Fit/predict only, univariate y, no X, no fh in predict."""
_tags = {"univariate_y": True, "fh_passed_in_fit": False}
args = {
"fit": {"y": _make_series(n_timepoints=20, random_state=RAND_SEED)},
"predict": {"fh": 1},
}
default_method_sequence = ["fit", "predict"]
The scenario ForecasterFitPredictUnivariateNoXLateFh
encodes instructions
applied to an estimator_instance
, via instances scenario
.
A call result = scenario.run(estimator_instance)
will:
first, call
estimator_instance.fit(y=_make_series(n_timepoints=20, random_state=RAND_SEED))
then, call
estimator_instance.predict(fh=1)
and return the output tooresult
.
The abstraction of “scenario” allows to specify multiple argument combinations across multiple methods.
The method run
also has arguments (method_sequence
and arg_sequence
)
that allow to override the method sequence, e.g.,
run them in a different order, or only a subset thereof.
Scenarios also provide a method scenario.is_applicable(estimator)
, which returns a boolean, whether
scenario
is applicable to estimator
. For instance, scenarios with univariate data are not applicable
to multivariate forecasters, and will cause exceptions in a fit
method call.
Non-applicable scenarios can be filtered out in positive tests, and filtered in in negative tests.
As a default, the sktime
implemented pytest_generate_tests
only pass applicable scenarios.
Further, scenarios inherit from BaseObject
, which allows to use the sktime
tag system with scenarios.
For further details on scenarios, inspect the docstring of BaseScenario
.
Remote CI set-up#
The remote CI runs all package level tests, module level tests, and low-level tests for all combinations of supported operating systems (OS) and python versions.
The estimators package and module level are distributed across OS and python version combinations so that:
only about a third of estimators are run per combination
a given estimator runs at least once for a given OS
a given estimator runs at least once for a python version
This is for reducing runtime and memory requirements for each CI element.
The precise logic maps estimators, OS and python versions on integers, and matches estimators with the sum of OS and python version modulo 3.
This logic located in subsample_by_version_os
in tests.test_all_estimators
,
which is called in pytest_generate_tests
of BaseFixtureGenerator
, which
is inherited by all the TestAll[estimator_type]
classes.
By default, the subsetting by OS and python version is switched off,
but can be turned on by setting the pytest
flag matrixdesign
to True
(see conftest.py
)
Extending the testing module#
This section explains how to extend the testing module. Depending on the primary change that is tested, the changes to the testing module will be shallow or deep. In decreasing order of commonality:
When adding new estimators or utility functionality, write low level tests that check correctness of the estimator.
These typically use only the simplest idioms in
pytest
(e.g., fixture parameterization).New estimators are also automatically discovered and looped over by the existing module and package level tests.
Introducing or changing base class level interface points will typically require addition of module level tests, and addition of, or modification to scenarios with functionality specific to these interface points.Rarely, this may require changes package level tests.
Major interface changes or addition of modules may require writing of entire test suites, and changes or additions to package level tests.
Adding low level tests#
Low level tests are “free-form” and should follow best pytest
practice.
pytest
tests should be located in the appropriate tests
folder of the module where a change is made.
Examples should be located in the docstring of the class or function added.
For an added estimator of name estimator_name
, the test file should be called test_estimator_name.py
.
Useful functionality to write tests:
example fixture generation, via
datatypes.get_examples
data format checkers in
datatypes
:check_is_mtype
,check_is_scitype
,check_raise
miscellaneous utilities in
utils
, especially in_testing
Escaping tests#
On occasion, it may make sense to escape individual estimators from individual tests.
This can be done (currently, as of 0.9.0) in two ways:
adding the estimator or test/estimator combination to the
EXCLUDED_TESTS
orEXCLUDE_ESTIMATORS
in the appropriate_config
file.adding a check condition in the
is_excluded
method used inpytest_generate_fixtures
, possibly only if the testing module supports this
Escaping tests directly in the tests, e.g., via if isinstance(estimator_instance, MyClass)
should be avoided where possible.
Adding package or module level tests#
Module level tests use pytest_generate_tests
to define fixtures.
The available fixtures vary per module, and are listed in the docstring of pytest_generate_tests
.
A new test should use these fixtures, if possible, but also can add new fixtures via pytest
basic fixture functionality.
If new fixture variables are to be used throughout the module, or depend on existing fixtures, instructions in the next section should be followed.
Where possible, scenarios should be used to simulate generic method calls (see above), instead of creating and passing arguments directly. Scenarios will ensure consistent coverage of input argument cases.
Adding fixture variables#
One-off fixture variables (localized to one or a few tests)
should be added using pytest
basic functionality, such as immutable constants,
pytest.fixture
or pytest.mark.parameterize
. Extending pytest_generate_tests
can also be considered in this case, if it makes the tests more (and not less) readable.
In contrast, fixtures used throughout module or package level tests should typically be added to the
fixture generation process called by pytest_generate_tests
.
This requires:
adding a function
_generate_[variablename](test_name, **kwargs)
, as described belowassigning the function to
generator_dict["variablename"]
adding the new variable in the
fixture_sequence
list inpytest_generate_tests
The function _generate_[variable_name](test_name, **kwargs)
should return two objects:
a list of fixture to loop over, to substitute for
variable_name
when appearing in a test signaturea list of names of equal length, i-th element used as a name for the i-th fixture in test logs
The function has access to:
test_name
, the name of the test the variable is called in.
This can be used to customize the list of fixtures for specific tests,
although this is meant for generic behaviour mainly.
One-off escapes and similar should be avoided here, and instead dealt with xfail
and similar.
the value of the fixture variables that appear earlier in
fixture_sequence
, inkwargs
.
For instance, the value of estimator_instance
, if this is a variable used in the test.
This can be used to make the list of fixtures for variable_name
dependent on the value of other fixtures variables
Adding or extending scenarios#
Scenarios can be added or modified if a new combination of method/input values should be tested. The two main options are:
adding a new scenario, similar to existing scenarios for an estimator scitype. This is the common case when a new input condition should be covered.
adding a method or argument key to existing scenarios. This is the common case when a new method or method sequence should be covered. For this, args cshould be added to the scenarios’
args
key of an existing scenario.
Scenarios for a specific estimator scitype are found in utils/_testing/scenarios_[estimator_scitype]
.
All scenarios inherit from a base class for that scitype, e.g., ForecasterTestScenario
.
This base class defines generics such as is_applicable
, or tag handling, for all scenarios of the same type.
Scenarios should usually define:
an
args
parameter: a dictionary, with arbitrary keys (usually names of methods).The
args
parameter may be set as a class variable, or set by the constructor.optionally, a
default_method_sequence
and adefault_arg_sequence
, lists of strings. These define the sequence in which methods are called, with which argument set, ifrun
is called. Both may be class variables, or object variable set in the constructor.side note: a
method_sequence
andarg_sequence
can also be specified inrun
. If not passed, defaulting will take place (first to each other, then to thedefault_etc
variables)optionally, a
_tags
dictionary, which is aBaseObject
tags dictionary and behaves exactly like that of estimators.optionally, a
get_args
method which allows to override key retrieval fromargs
. For instance, to specify rules such as “if the key starts withpredict_
, always return …”optionally, an
is_applicable
method which allows to compare the scenario with estimators. For instance, comparing whether both scenario and estimator are multivariate.
For further details and expected signature, consult the docstring of TestScenario
(link),
and/or inspect any of the scenarios base classes, e.g., ForecasterTestScenario
.
Creating tests for a new estimator type#
If a module for a new estimator type is added, multiple things need to be created for module level tests:
scenarios to cover the specified base class interface behaviour, in
utils/_testing/scenarios_[estimator_scitype]
. This can be modelled onutils/_testing/scenarios_forecasting
, or the other scenarios files.a line in the dispatch dictionary in
utils/_testing/scenarios_getter
which links the scenarios to the scenario retrieval function, e.g.,scenarios["forecaster"] = scenarios_forecasting
a
tests/test_all_[estimator_scitype].py
, from the root of the module.in this file, appropriate fixture generation via
pytest_generate_fixtures
. This can be modelled offtest_all_estimators
ortest_all_forecasters
.and, a collection of tests for interface compliance with the base class of the estimator type. The tests should cover positive cases, as well as testing raising of informative error message in negative cases.