Glossary of Common Terms#

The glossary below defines common terms and API elements used throughout sktime.

Note

The glossary is under development. Important terms are still missing. Please create a pull request if you want to add one.

Application#

A single-purpose piece of code that practitioners write to solve a particular applied problem. Compare with toolbox and framework.

Bagging:#

A technique in ensemble learning where multiple models are trained on different subsets of the training data, and individual model outputs are averaged by some rule (e.g., majority vote) to obtain a consensus prediction.

Composite estimator#

An estimator that consists of multiple other component estimators which can vary. An example would be a pipeline consisting of a transformer and forecaster.

Endogenous#

Within a learning task endogenous variables are determined by exogenous variables or past timepoints of the variable itself. Also referred to as the dependent variable or target.

Ensemble learning:#

A technique in which multiple models are combined to improve the overall performance of a predictive model.

Estimator#

An algorithm of a specific scientific type, implementing the python class interface defined by it. For example, the ARIMA class.

Exogenous#

Within a learning task exogenous variables are external factors whose pattern of impact on tasks’ endogenous variables must be learned. Also referred to as independent variables or features.

Feature extraction:#

A technique used to extract useful information from raw data. In time series analysis, this may involve transforming the data to a frequency domain, decomposing the signal into components, or extracting statistical features.

Forecasting#

A learning task focused on prediction future values of a time series. For more details, see the Introduction.

Framework#

A collection of related and reusable software design templates that practitioners can copy and fill in. Frameworks emphasize design reuse. They capture common software design decisions within a given application domain and distill them into reusable design templates. This reduces the design decision they must take, allowing them to focus on application specifics. Not only can practitioners write software faster as a result, but applications will have a similar structure. Frameworks often offer additional functionality like toolboxes. Compare with toolbox and application.

Generalization:#

The ability of a predictive model to perform well on unseen data. A model that overfits to the training data may not generalize well, while a model that underfits may not capture the underlying patterns in the data.

Hyperparameter:#

A parameter of a machine learning model that is set at construction. Usually, this affects the model’s performance. Examples include the learning rate in a neural network, the number of trees in a random forest, or the regularization parameter in a linear model.

Instance#

A member of the set of entities being studied and which an ML practitioner wishes to generalize. For example, patients, chemical process runs, machines, countries, etc. May also be referred to as samples, examples, observations or records depending on the discipline and context.

Model selection:#

The process of selecting the best machine learning model for a given task. This may involve comparing the performance of different models on a validation set, or using techniques like grid search to find the best hyperparameters for a given model.

Multivariate time series#

Multiple time series. Typically observed for the same observational unit. Multivariate time series is typically used to refer to cases where the series evolve together over time. This is related, but different than the cases where a univariate time series is dependent on exogenous data.

Panel time series#

A form of time series data where the same time series are observed observed for multiple observational units. The observed series may consist of univariate time series or multivariate time series. Accordingly, the data varies across time, observational unit and series (i.e. variables).

Reduction#

Reduction refers to decomposing a given learning task into simpler tasks that can be composed to create a solution to the original task. In sktime reduction is used to allow one learning task to be adapted as a solution for an alternative task.

Scientific type#

A class or object type to denote a category of objects defined by a common python class interface and data scientific purpose. For example, “forecaster” or “classifier”, and the interface defined in BaseForecaster or BaseClassifier.

Scitype#

See scientific type.

Seasonality#

When a time series is affected by seasonal characteristics such as the time of year or the day of the week, it is called a seasonal pattern. The duration of a season is always fixed and known.

Tabular#

Is a setting where each timepoint of the univariate time series being measured for each instance are treated as features and stored as a primitive data type in the DataFrame’s cells. E.g., there are N instances of time series and each has T timepoints, this would yield a pandas DataFrame with shape (N, T): N rows, T columns.

Time series#

Data where the variable measurements are ordered over time or an index indicating the position of an observation in the sequence of values.

Time series annotation#

A learning task focused on labeling the timepoints of a time series. This includes the related tasks of outlier detection, anomaly detection, change point detection and segmentation.

Time series classification#

A learning task focused on using the patterns across instances between the time series and a categorical target variable.

Time series clustering#

A learning task focused on discovering groups consisting of instances with similar time series.

Time series decomposition:#

A technique used to separate a time series into its underlying components, such as trend, seasonality, and noise. This can be useful for understanding the patterns in the data and for modeling each component separately.

Time series regression#

A learning task focused on using the patterns across instances between the time series and a continuous target variable.

Timepoint#

The point in time that an observation is made. A time point may represent an exact point in time (a timestamp), a timeperiod (e.g. minutes, hours or days), or simply an index indicating the position of an observation in the sequence of values.

Toolbox#

A collection of related and reusable functionality that practitioners can import to write applications. Toolboxes emphasize code reuse. Compare with framework and application.

Trend#

When data shows a long-term increase or decrease, this is referred to as a trend. Trends can also be non-linear.

Univariate time series#

A single time series. While univariate analysis often only uses information contained in the series itself, univariate time series regression and forecasting can also include exogenous data.

Variable#

Refers to some measurement of interest. Variables may be cross-sectional (e.g. time-invariant measurements like a patient’s place of birth) or time series.