MLflow#
The sktime custom model flavor enables logging of sktime models in MLflow format via the sktime.utils.mlflow_sktime.save_model()
and sktime.utils.mlflow_sktime.log_model()
methods. These methods also add the pyfunc
flavor to the MLflow Models that they produce, allowing the model to be interpreted as generic Python functions for inference via sktime.utils.mlflow_sktime.pyfunc.load_model()
. This loaded PyFunc model can only be scored with a DataFrame input. You can also use the
sktime.utils.mlflow_sktime.load_model()
method to load MLflow Models with the sktime model flavor in native sktime formats.
The pyfunc
flavor of the model supports sktime predict methods predict
, predict_interval
, predict_proba
, predict_quantiles
, predict_var
.
The interface for utilizing a sktime model loaded as a pyfunc
type for generating forecasts requires passing an exogenous regressor as Pandas DataFrame to the pyfunc.predict()
method (an empty DataFrame must be passed if no exogenous regressor is used). The configuration of predict methods and parameter values passed to the predict methods is defined by a dictionary to be saved as an attribute of the fitted sktime model instance. If no prediction configuration is defined
pyfunc.predict()
will return output from sktime predict
method. Note that for pyfunc
flavor the forecasting horizon fh
must be passed to the fit method.
Predict methods and parameter values for pyfunc
flavor can be defined in two ways: - Dict[str, dict]
if parameter values are passed to pyfunc.predict()
, for example {"predict_method": {"predict": {}, "predict_interval": {"coverage": [0.1, 0.9]}}
- Dict[str, list]
, with default parameters in predict method, for example {"predict_method": ["predict", "predict_interval"}
(Note: when including predict_proba
method the former approach must be followed as quantiles
parameter has to be provided by the user) - If no prediction config is defined pyfunc.predict()
will return output from sktime predict()
method
Signature logging for sktime from a non-pyfunc artifact will not function correctly for predict_interval
or predict_quantiles
. The output of the native sktime model flavor for these methods is not a recognized signature type due to the MultiIndex column structure of the returned DataFrame. MLflow’s infer_schema
will function correctly if using the pyfunc
flavor of the model, though.
1. Setup#
1.1 Config#
[1]:
model_path = "model"
1.1 Imports#
[2]:
import mlflow
from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import temporal_train_test_split
from sktime.utils import mlflow_sktime
1.2 Load sample data#
[3]:
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
2. Example usage of native sktime flavor
and pyfunc flavor
#
2.1 Create prediction config for pyfunc flavor#
[4]:
coverage = [0.8, 0.9]
quantiles = [0.1, 0.9]
pyfunc_predict_conf = {
"predict_method": {
"predict": {},
"predict_interval": {"coverage": coverage},
"predict_proba": {"quantiles": quantiles},
"predict_quantiles": {},
"predict_var": {},
}
}
2.2 Train and save model#
[5]:
with mlflow.start_run():
forecaster = NaiveForecaster()
forecaster.fit(
y_train,
X=X_train,
fh=[1, 2, 3],
)
forecaster.pyfunc_predict_conf = pyfunc_predict_conf
mlflow_sktime.save_model(forecaster, model_path)
/home/benjamin/anaconda3/envs/sktime-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
2.3 Load model#
2.3.1 Native sktime flavor#
[6]:
loaded_model = mlflow_sktime.load_model(model_path)
2.3.2 Pyfunc flavor#
[7]:
loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)
2.4 Generate predictions#
2.4.1 Native sktime flavor#
[8]:
loaded_model.predict(X=X_test)
[8]:
1959 66513.0
1960 66513.0
1961 66513.0
Freq: A-DEC, dtype: float64
[9]:
loaded_model.predict_interval(X=X_test, coverage=coverage)
[9]:
Coverage | ||||
---|---|---|---|---|
0.8 | 0.9 | |||
lower | upper | lower | upper | |
1959 | 64719.913711 | 68306.086289 | 64211.598663 | 68814.401337 |
1960 | 63977.193051 | 69048.806949 | 63258.327017 | 69767.672983 |
1961 | 63407.283445 | 69618.716555 | 62526.855956 | 70499.144044 |
[10]:
y_pred_dist = loaded_model.predict_proba(X=X)
y_pred_dist_quantiles = y_pred_dist.quantile(quantiles)
y_pred_dist_quantiles
2022-12-19 10:07:18.984171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:07:19.137912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.137942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-19 10:07:19.961244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-19 10:07:21.052618: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-19 10:07:21.053031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055054: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-12-19 10:07:21.055334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[10]:
Quantiles_0.1 | Quantiles_0.9 | |
---|---|---|
1959 | 64719.914062 | 68306.085938 |
1960 | 63977.191406 | 69048.804688 |
1961 | 63407.281250 | 69618.718750 |
[11]:
loaded_model.predict_quantiles(X=X_test)
[11]:
Quantiles | ||
---|---|---|
0.05 | 0.95 | |
1959 | 64211.598663 | 68814.401337 |
1960 | 63258.327017 | 69767.672983 |
1961 | 62526.855956 | 70499.144044 |
[12]:
loaded_model.predict_var(X=X_test)
[12]:
0 | |
---|---|
1959 | 1.957628e+06 |
1960 | 3.915256e+06 |
1961 | 5.872885e+06 |
2.4.2 Pyfunc flavor#
[13]:
loaded_pyfunc.predict(X_test)
[13]:
predict__0 | predict_interval__Coverage__0.8__lower | predict_interval__Coverage__0.8__upper | predict_interval__Coverage__0.9__lower | predict_interval__Coverage__0.9__upper | predict_proba__Quantiles_0.1 | predict_proba__Quantiles_0.9 | predict_quantiles__Quantiles__0.05 | predict_quantiles__Quantiles__0.95 | predict_var__0 | |
---|---|---|---|---|---|---|---|---|---|---|
1959 | 66513.0 | 64719.913711 | 68306.086289 | 64211.598663 | 68814.401337 | 64719.914062 | 68306.085938 | 64211.598663 | 68814.401337 | 1.957628e+06 |
1960 | 66513.0 | 63977.193051 | 69048.806949 | 63258.327017 | 69767.672983 | 63977.191406 | 69048.804688 | 63258.327017 | 69767.672983 | 3.915256e+06 |
1961 | 66513.0 | 63407.283445 | 69618.716555 | 62526.855956 | 70499.144044 | 63407.281250 | 69618.718750 | 62526.855956 | 70499.144044 | 5.872885e+06 |
3. Model deployment example#
3.1 Create experiment#
[14]:
artifact_path = "model"
mlflow.set_experiment("Test Sktime")
with mlflow.start_run() as run:
forecaster = NaiveForecaster()
forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
forecaster.pyfunc_predict_conf = pyfunc_predict_conf
mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)
run_id = run.info.run_id
print(f"MLflow run id: {run_id}")
2022/12/19 10:07:21 INFO mlflow.tracking.fluent: Experiment with name 'Test Sktime' does not exist. Creating a new experiment.
MLflow run id: ec94d157fe354c1bbdd4dec0898e1ed6
3.2 Deploy pyfunc model to local REST API endpoint#
Open a terminal window and cd into
examples
directoryIn the terminal run:
mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>
where you substitute
<RUN_ID>
by therun_id
and<HOST>
by the network address to listen on (e.g.127.0.0.1
)
More details here: https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve
3.3 Request predictions from local REST API endpoint#
For more details see: https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools
3.3.1 JSON input using dataframe_split
field with pandas DataFrame in the split
orientation#
[15]:
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"
X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)
# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}
3.3.2 JSON input using dataframe_records
field with pandas DataFrame in the records
orientation#
[16]:
json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)
# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}
3.3.3 CSV input using valid pd.DataFrame
csv representation#
[17]:
headers = {
"Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)
# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()
,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0
Generated using nbsphinx. The Jupyter notebook can be found here.