MLflow#
The sktime custom model flavor enables logging of sktime models in MLflow format via the sktime.utils.mlflow_sktime.save_model() and sktime.utils.mlflow_sktime.log_model() methods. These methods also add the pyfunc flavor to the MLflow Models that they produce, allowing the model to be interpreted as generic Python functions for inference via sktime.utils.mlflow_sktime.pyfunc.load_model(). This loaded PyFunc model can only be scored with a DataFrame input. You can also use the
sktime.utils.mlflow_sktime.load_model() method to load MLflow Models with the sktime model flavor in native sktime formats.
The pyfunc flavor of the model supports sktime predict methods predict, predict_interval, predict_proba, predict_quantiles, predict_var.
The interface for utilizing a sktime model loaded as a pyfunc type for generating forecasts requires passing an exogenous regressor as Pandas DataFrame to the pyfunc.predict() method (an empty DataFrame must be passed if no exogenous regressor is used). The configuration of predict methods and parameter values passed to the predict methods is defined by a dictionary to be saved as an attribute of the fitted sktime model instance. If no prediction configuration is defined
pyfunc.predict() will return output from sktime predict method. Note that for pyfunc flavor the forecasting horizon fh must be passed to the fit method.
Predict methods and parameter values for pyfunc flavor can be defined in two ways: - Dict[str, dict] if parameter values are passed to pyfunc.predict(), for example {"predict_method": {"predict": {}, "predict_interval": {"coverage": [0.1, 0.9]}} - Dict[str, list], with default parameters in predict method, for example {"predict_method": ["predict", "predict_interval"} (Note: when including predict_proba method the former approach must be followed as quantiles
parameter has to be provided by the user) - If no prediction config is defined pyfunc.predict() will return output from sktime predict() method
Signature logging for sktime from a non-pyfunc artifact will not function correctly for predict_interval or predict_quantiles. The output of the native sktime model flavor for these methods is not a recognized signature type due to the MultiIndex column structure of the returned DataFrame. MLflow’s infer_schema will function correctly if using the pyfunc flavor of the model, though.
1. Setup#
1.1 Config#
[1]:
model_path = "model"
1.1 Imports#
[2]:
import mlflow
from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import temporal_train_test_split
from sktime.utils import mlflow_sktime
1.2 Load sample data#
[3]:
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
2. Example usage of native sktime flavor and pyfunc flavor#
2.1 Create prediction config for pyfunc flavor#
[4]:
coverage = [0.8, 0.9]
quantiles = [0.1, 0.9]
pyfunc_predict_conf = {
"predict_method": {
"predict": {},
"predict_interval": {"coverage": coverage},
"predict_proba": {"quantiles": quantiles},
"predict_quantiles": {},
"predict_var": {},
}
}
2.2 Train and save model#
[5]:
with mlflow.start_run():
forecaster = NaiveForecaster()
forecaster.fit(
y_train,
X=X_train,
fh=[1, 2, 3],
)
forecaster.pyfunc_predict_conf = pyfunc_predict_conf
mlflow_sktime.save_model(forecaster, model_path)
/home/benjamin/anaconda3/envs/sktime-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
2.3 Load model#
2.3.1 Native sktime flavor#
[6]:
loaded_model = mlflow_sktime.load_model(model_path)
2.3.2 Pyfunc flavor#
[7]:
loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)
2.4 Generate predictions#
2.4.1 Native sktime flavor#
[8]:
loaded_model.predict(X=X_test)
[8]:
1959 66513.0
1960 66513.0
1961 66513.0
Freq: A-DEC, dtype: float64
[9]:
loaded_model.predict_interval(X=X_test, coverage=coverage)
[9]:
| Coverage | ||||
|---|---|---|---|---|
| 0.8 | 0.9 | |||
| lower | upper | lower | upper | |
| 1959 | 64719.913711 | 68306.086289 | 64211.598663 | 68814.401337 |
| 1960 | 63977.193051 | 69048.806949 | 63258.327017 | 69767.672983 |
| 1961 | 63407.283445 | 69618.716555 | 62526.855956 | 70499.144044 |
[10]:
y_pred_dist = loaded_model.predict_proba(X=X)
y_pred_dist_quantiles = y_pred_dist.quantile(quantiles)
y_pred_dist_quantiles
2022-12-19 10:07:18.984171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:07:19.137912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.137942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-19 10:07:19.961244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-19 10:07:21.052618: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-19 10:07:21.053031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055054: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-12-19 10:07:21.055334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[10]:
| Quantiles_0.1 | Quantiles_0.9 | |
|---|---|---|
| 1959 | 64719.914062 | 68306.085938 |
| 1960 | 63977.191406 | 69048.804688 |
| 1961 | 63407.281250 | 69618.718750 |
[11]:
loaded_model.predict_quantiles(X=X_test)
[11]:
| Quantiles | ||
|---|---|---|
| 0.05 | 0.95 | |
| 1959 | 64211.598663 | 68814.401337 |
| 1960 | 63258.327017 | 69767.672983 |
| 1961 | 62526.855956 | 70499.144044 |
[12]:
loaded_model.predict_var(X=X_test)
[12]:
| 0 | |
|---|---|
| 1959 | 1.957628e+06 |
| 1960 | 3.915256e+06 |
| 1961 | 5.872885e+06 |
2.4.2 Pyfunc flavor#
[13]:
loaded_pyfunc.predict(X_test)
[13]:
| predict__0 | predict_interval__Coverage__0.8__lower | predict_interval__Coverage__0.8__upper | predict_interval__Coverage__0.9__lower | predict_interval__Coverage__0.9__upper | predict_proba__Quantiles_0.1 | predict_proba__Quantiles_0.9 | predict_quantiles__Quantiles__0.05 | predict_quantiles__Quantiles__0.95 | predict_var__0 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1959 | 66513.0 | 64719.913711 | 68306.086289 | 64211.598663 | 68814.401337 | 64719.914062 | 68306.085938 | 64211.598663 | 68814.401337 | 1.957628e+06 |
| 1960 | 66513.0 | 63977.193051 | 69048.806949 | 63258.327017 | 69767.672983 | 63977.191406 | 69048.804688 | 63258.327017 | 69767.672983 | 3.915256e+06 |
| 1961 | 66513.0 | 63407.283445 | 69618.716555 | 62526.855956 | 70499.144044 | 63407.281250 | 69618.718750 | 62526.855956 | 70499.144044 | 5.872885e+06 |
3. Model deployment example#
3.1 Create experiment#
[14]:
artifact_path = "model"
mlflow.set_experiment("Test Sktime")
with mlflow.start_run() as run:
forecaster = NaiveForecaster()
forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
forecaster.pyfunc_predict_conf = pyfunc_predict_conf
mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)
run_id = run.info.run_id
print(f"MLflow run id: {run_id}")
2022/12/19 10:07:21 INFO mlflow.tracking.fluent: Experiment with name 'Test Sktime' does not exist. Creating a new experiment.
MLflow run id: ec94d157fe354c1bbdd4dec0898e1ed6
3.2 Deploy pyfunc model to local REST API endpoint#
Open a terminal window and cd into
examplesdirectoryIn the terminal run:
mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>where you substitute
<RUN_ID>by therun_idand<HOST>by the network address to listen on (e.g.127.0.0.1)
More details here: https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve
3.3 Request predictions from local REST API endpoint#
For more details see: https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools
3.3.1 JSON input using dataframe_split field with pandas DataFrame in the split orientation#
[15]:
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"
X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)
# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}
3.3.2 JSON input using dataframe_records field with pandas DataFrame in the records orientation#
[16]:
json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)
# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}
3.3.3 CSV input using valid pd.DataFrame csv representation#
[17]:
headers = {
"Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)
# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()
,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0
Generated using nbsphinx. The Jupyter notebook can be found here.