binder

MLflow#

The sktime custom model flavor enables logging of sktime models in MLflow format via the sktime.utils.mlflow_sktime.save_model() and sktime.utils.mlflow_sktime.log_model() methods. These methods also add the pyfunc flavor to the MLflow Models that they produce, allowing the model to be interpreted as generic Python functions for inference via sktime.utils.mlflow_sktime.pyfunc.load_model(). This loaded PyFunc model can only be scored with a DataFrame input. You can also use the sktime.utils.mlflow_sktime.load_model() method to load MLflow Models with the sktime model flavor in native sktime formats.

The pyfunc flavor of the model supports sktime predict methods predict, predict_interval, predict_proba, predict_quantiles, predict_var.

The interface for utilizing a sktime model loaded as a pyfunc type for generating forecasts requires passing an exogenous regressor as Pandas DataFrame to the pyfunc.predict() method (an empty DataFrame must be passed if no exogenous regressor is used). The configuration of predict methods and parameter values passed to the predict methods is defined by a dictionary to be saved as an attribute of the fitted sktime model instance. If no prediction configuration is defined pyfunc.predict() will return output from sktime predict method. Note that for pyfunc flavor the forecasting horizon fh must be passed to the fit method.

Predict methods and parameter values for pyfunc flavor can be defined in two ways: - Dict[str, dict] if parameter values are passed to pyfunc.predict(), for example {"predict_method": {"predict": {}, "predict_interval": {"coverage": [0.1, 0.9]}} - Dict[str, list], with default parameters in predict method, for example {"predict_method": ["predict", "predict_interval"} (Note: when including predict_proba method the former approach must be followed as quantiles parameter has to be provided by the user) - If no prediction config is defined pyfunc.predict() will return output from sktime predict() method

Signature logging for sktime from a non-pyfunc artifact will not function correctly for predict_interval or predict_quantiles. The output of the native sktime model flavor for these methods is not a recognized signature type due to the MultiIndex column structure of the returned DataFrame. MLflow’s infer_schema will function correctly if using the pyfunc flavor of the model, though.

1. Setup#

1.1 Config#

[1]:
model_path = "model"

1.1 Imports#

[2]:
import mlflow

from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import temporal_train_test_split
from sktime.utils import mlflow_sktime

1.2 Load sample data#

[3]:
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

2. Example usage of native sktime flavor and pyfunc flavor#

2.1 Create prediction config for pyfunc flavor#

[4]:
coverage = [0.8, 0.9]
quantiles = [0.1, 0.9]

pyfunc_predict_conf = {
    "predict_method": {
        "predict": {},
        "predict_interval": {"coverage": coverage},
        "predict_proba": {"quantiles": quantiles},
        "predict_quantiles": {},
        "predict_var": {},
    }
}

2.2 Train and save model#

[5]:
with mlflow.start_run():
    forecaster = NaiveForecaster()
    forecaster.fit(
        y_train,
        X=X_train,
        fh=[1, 2, 3],
    )
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.save_model(forecaster, model_path)
/home/benjamin/anaconda3/envs/sktime-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

2.3 Load model#

2.3.1 Native sktime flavor#

[6]:
loaded_model = mlflow_sktime.load_model(model_path)

2.3.2 Pyfunc flavor#

[7]:
loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)

2.4 Generate predictions#

2.4.1 Native sktime flavor#

[8]:
loaded_model.predict(X=X_test)
[8]:
1959    66513.0
1960    66513.0
1961    66513.0
Freq: A-DEC, dtype: float64
[9]:
loaded_model.predict_interval(X=X_test, coverage=coverage)
[9]:
Coverage
0.8 0.9
lower upper lower upper
1959 64719.913711 68306.086289 64211.598663 68814.401337
1960 63977.193051 69048.806949 63258.327017 69767.672983
1961 63407.283445 69618.716555 62526.855956 70499.144044
[10]:
y_pred_dist = loaded_model.predict_proba(X=X)
y_pred_dist_quantiles = y_pred_dist.quantile(quantiles)
y_pred_dist_quantiles
2022-12-19 10:07:18.984171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:07:19.137912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.137942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-19 10:07:19.961244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-19 10:07:21.052618: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-19 10:07:21.053031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055054: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-12-19 10:07:21.055334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[10]:
Quantiles_0.1 Quantiles_0.9
1959 64719.914062 68306.085938
1960 63977.191406 69048.804688
1961 63407.281250 69618.718750
[11]:
loaded_model.predict_quantiles(X=X_test)
[11]:
Quantiles
0.05 0.95
1959 64211.598663 68814.401337
1960 63258.327017 69767.672983
1961 62526.855956 70499.144044
[12]:
loaded_model.predict_var(X=X_test)
[12]:
0
1959 1.957628e+06
1960 3.915256e+06
1961 5.872885e+06

2.4.2 Pyfunc flavor#

[13]:
loaded_pyfunc.predict(X_test)
[13]:
predict__0 predict_interval__Coverage__0.8__lower predict_interval__Coverage__0.8__upper predict_interval__Coverage__0.9__lower predict_interval__Coverage__0.9__upper predict_proba__Quantiles_0.1 predict_proba__Quantiles_0.9 predict_quantiles__Quantiles__0.05 predict_quantiles__Quantiles__0.95 predict_var__0
1959 66513.0 64719.913711 68306.086289 64211.598663 68814.401337 64719.914062 68306.085938 64211.598663 68814.401337 1.957628e+06
1960 66513.0 63977.193051 69048.806949 63258.327017 69767.672983 63977.191406 69048.804688 63258.327017 69767.672983 3.915256e+06
1961 66513.0 63407.283445 69618.716555 62526.855956 70499.144044 63407.281250 69618.718750 62526.855956 70499.144044 5.872885e+06

3. Model deployment example#

3.1 Create experiment#

[14]:
artifact_path = "model"

mlflow.set_experiment("Test Sktime")

with mlflow.start_run() as run:
    forecaster = NaiveForecaster()
    forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)

run_id = run.info.run_id
print(f"MLflow run id: {run_id}")
2022/12/19 10:07:21 INFO mlflow.tracking.fluent: Experiment with name 'Test Sktime' does not exist. Creating a new experiment.
MLflow run id: ec94d157fe354c1bbdd4dec0898e1ed6

3.2 Deploy pyfunc model to local REST API endpoint#

  • Open a terminal window and cd into examplesdirectory

  • In the terminal run: mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>

    • where you substitute <RUN_ID> by the run_id and <HOST> by the network address to listen on (e.g. 127.0.0.1)

  • More details here: https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve

3.3 Request predictions from local REST API endpoint#

3.3.1 JSON input using dataframe_split field with pandas DataFrame in the split orientation#

[15]:
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)

# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}

3.3.2 JSON input using dataframe_records field with pandas DataFrame in the records orientation#

[16]:
json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}

3.3.3 CSV input using valid pd.DataFrame csv representation#

[17]:
headers = {
    "Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()
,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0


Generated using nbsphinx. The Jupyter notebook can be found here.