MLflow#

The sktime custom model flavor enables logging of sktime models in MLflow format via the sktime.utils.mlflow_sktime.save_model() and sktime.utils.mlflow_sktime.log_model() methods. These methods also add the pyfunc flavor to the MLflow Models that they produce, allowing the model to be interpreted as generic Python functions for inference via sktime.utils.mlflow_sktime.pyfunc.load_model(). This loaded PyFunc model can only be scored with a DataFrame input. You can also use the sktime.utils.mlflow_sktime.load_model() method to load MLflow Models with the sktime model flavor in native sktime formats.

The pyfunc flavor of the model supports sktime predict methods predict, predict_interval, predict_proba, predict_quantiles, predict_var.

The interface for utilizing a sktime model loaded as a pyfunc type for generating forecasts requires passing an exogenous regressor as Pandas DataFrame to the pyfunc.predict() method (an empty DataFrame must be passed if no exogenous regressor is used). The configuration of predict methods and parameter values passed to the predict methods is defined by a dictionary to be saved as an attribute of the fitted sktime model instance. If no prediction configuration is defined pyfunc.predict() will return output from sktime predict method. Note that for pyfunc flavor the forecasting horizon fh must be passed to the fit method.

Predict methods and parameter values for pyfunc flavor can be defined in two ways: - Dict[str, dict] if parameter values are passed to pyfunc.predict(), for example {"predict_method": {"predict": {}, "predict_interval": {"coverage": [0.1, 0.9]}} - Dict[str, list], with default parameters in predict method, for example {"predict_method": ["predict", "predict_interval"} (Note: when including predict_proba method the former approach must be followed as quantiles parameter has to be provided by the user) - If no prediction config is defined pyfunc.predict() will return output from sktime predict() method

Signature logging for sktime from a non-pyfunc artifact will not function correctly for predict_interval or predict_quantiles. The output of the native sktime model flavor for these methods is not a recognized signature type due to the MultiIndex column structure of the returned DataFrame. MLflow’s infer_schema will function correctly if using the pyfunc flavor of the model, though.

1. Setup#

1.1 Config#

[1]:

model_path = "model"

1.1 Imports#

[2]:

import mlflow

from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import temporal_train_test_split
from sktime.utils import mlflow_sktime

1.2 Load sample data#

[3]:

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

2. Example usage of native `sktime flavor` and `pyfunc flavor`#

2.1 Create prediction config for pyfunc flavor#

[4]:

coverage = [0.8, 0.9]
quantiles = [0.1, 0.9]

pyfunc_predict_conf = {
    "predict_method": {
        "predict": {},
        "predict_interval": {"coverage": coverage},
        "predict_proba": {"quantiles": quantiles},
        "predict_quantiles": {},
        "predict_var": {},
    }
}

2.2 Train and save model#

[5]:

with mlflow.start_run():
    forecaster = NaiveForecaster()
    forecaster.fit(
        y_train,
        X=X_train,
        fh=[1, 2, 3],
    )
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.save_model(forecaster, model_path)

/home/benjamin/anaconda3/envs/sktime-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

2.3 Load model#

2.3.1 Native sktime flavor#

[6]:

loaded_model = mlflow_sktime.load_model(model_path)

2.3.2 Pyfunc flavor#

[7]:

loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)

2.4 Generate predictions#

2.4.1 Native sktime flavor#

[8]:

loaded_model.predict(X=X_test)

[8]:

1959    66513.0
1960    66513.0
1961    66513.0
Freq: A-DEC, dtype: float64

[9]:

loaded_model.predict_interval(X=X_test, coverage=coverage)

[9]:

	Coverage
	0.8		0.9
	lower	upper	lower	upper
1959	64719.913711	68306.086289	64211.598663	68814.401337
1960	63977.193051	69048.806949	63258.327017	69767.672983
1961	63407.283445	69618.716555	62526.855956	70499.144044

[10]:

y_pred_dist = loaded_model.predict_proba(X=X)
y_pred_dist_quantiles = y_pred_dist.quantile(quantiles)
y_pred_dist_quantiles

2022-12-19 10:07:18.984171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:07:19.137912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.137942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-19 10:07:19.961244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-19 10:07:21.052618: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-19 10:07:21.053031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055054: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-12-19 10:07:21.055334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

[10]:

	Quantiles_0.1	Quantiles_0.9
1959	64719.914062	68306.085938
1960	63977.191406	69048.804688
1961	63407.281250	69618.718750

[11]:

loaded_model.predict_quantiles(X=X_test)

[11]:

	Quantiles
	0.05	0.95
1959	64211.598663	68814.401337
1960	63258.327017	69767.672983
1961	62526.855956	70499.144044

[12]:

loaded_model.predict_var(X=X_test)

[12]:

	0
1959	1.957628e+06
1960	3.915256e+06
1961	5.872885e+06

2.4.2 Pyfunc flavor#

[13]:

loaded_pyfunc.predict(X_test)

[13]:

	predict__0	predict_interval__Coverage__0.8__lower	predict_interval__Coverage__0.8__upper	predict_interval__Coverage__0.9__lower	predict_interval__Coverage__0.9__upper	predict_proba__Quantiles_0.1	predict_proba__Quantiles_0.9	predict_quantiles__Quantiles__0.05	predict_quantiles__Quantiles__0.95	predict_var__0
1959	66513.0	64719.913711	68306.086289	64211.598663	68814.401337	64719.914062	68306.085938	64211.598663	68814.401337	1.957628e+06
1960	66513.0	63977.193051	69048.806949	63258.327017	69767.672983	63977.191406	69048.804688	63258.327017	69767.672983	3.915256e+06
1961	66513.0	63407.283445	69618.716555	62526.855956	70499.144044	63407.281250	69618.718750	62526.855956	70499.144044	5.872885e+06

3. Model deployment example#

3.1 Create experiment#

[14]:

artifact_path = "model"

mlflow.set_experiment("Test Sktime")

with mlflow.start_run() as run:
    forecaster = NaiveForecaster()
    forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)

run_id = run.info.run_id
print(f"MLflow run id: {run_id}")

2022/12/19 10:07:21 INFO mlflow.tracking.fluent: Experiment with name 'Test Sktime' does not exist. Creating a new experiment.

MLflow run id: ec94d157fe354c1bbdd4dec0898e1ed6

3.2 Deploy pyfunc model to local REST API endpoint#

Open a terminal window and cd into examplesdirectory
In the terminal run: mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>
- where you substitute <RUN_ID> by the run_id and <HOST> by the network address to listen on (e.g. 127.0.0.1)
More details here: https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve

3.3 Request predictions from local REST API endpoint#

For more details see: https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools

3.3.1 JSON input using `dataframe_split` field with pandas DataFrame in the `split` orientation#

[15]:

host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)

# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()

{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}

3.3.2 JSON input using `dataframe_records` field with pandas DataFrame in the `records` orientation#

[16]:

json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()

{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}

3.3.3 CSV input using valid `pd.DataFrame` csv representation#

[17]:

headers = {
    "Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()

,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0

Generated using nbsphinx. The Jupyter notebook can be found here.