Many times the developer of the model is different from the developer who is consuming the model in their application. For example, a developer creating an API that takes an input image and categorizes them wouldn’t know in and out of your model and doesn’t need to. All the developer needs to know is where to load the model from and how to inference it. MLflow provides pretty good APIs to do just that.

Customized MLflow Model

mlflow.pyfunc.PythonModel: Represents a generic Python model that evaluates inputs and produces API-compatible outputs. By subclassing, users can create customized MLflow models with the “python_function” (“pyfunc”) flavor, leveraging custom inference logic and artifact dependencies.

_Source: _https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.PythonModel

MLflow allows one to create customized models apart from what they support natively such as keras. This model can be saved as a mlflow model and used afterward. So irrespective of what your production model type is, a sample model using mlflow.pyfunc.PythonModel could be created to emulate that.

Following is an example of AddN model from MLflow documentation. In this only init and predict method is implemented and we will do just the same

import mlflow.pyfunc

## Define the model class
class AddN(mlflow.pyfunc.PythonModel):

    def __init__(self, n):
        self.n = n

    def predict(self, context, model_input):
        return model_input.apply(lambda column: column + self.n)

## Construct and save the model
model_path = "add_n_model"
add5_model = AddN(n=5)
mlflow.pyfunc.save_model(path=model_path, python_model=add5_model)

## Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)

## Evaluate the model
import pandas as pd
model_input = pd.DataFrame([range(10)])
model_output = loaded_model.predict(model_input)
assert model_output.equals(pd.DataFrame([range(5, 15)]))
Source: https://www.mlflow.org/docs/latest/models.html#example-creating-a-custom-add-n-model

Bonus Use Case

Before going into implementation, let us motivate a use case. Let us say Acne Inc. decided to provide a bonus to its employees using a predictive model. Based on certain inputs, the model predicts what should be the bonus for that employee.

What that model does and its input is beyond the scope of this post. What is more interesting is how the developer uses this model. The developer who integrates this model in the HR system is provided a location of this model and asked to expose a method amount that given input, outputs the bonus amount. The developer is told that the model is mlflow model and can use the provided APIs to predict.

Our developer comes up with a simple implementation as above. Now the challenge is that to unit test this implementation to make sure this works. And ideally, the unit test should be CI/CD compatible so rather training the model for testing, the solution should be lightweight. Enter mlflow.pyfunc.PythonModel!

Test Bonus Model

The developer comes up with a test model which could be initialized by a dictionary. Dictionary is basically a predefined outcome of the given input. predict method does just that. In this case, it is assumed each element of model_input is of the type which supports the hash method so it could be a key to the dictionary.

Unit Testing…Yay!

Now comes the best part where the developer tests the Bonus class it created with a Test Bonus Model.

import os
	import shutil
	import mlflow.pyfunc
	import unittest
	import tempfile
	from mlflow_buisness import Bonus

	class Model(mlflow.pyfunc.PythonModel):

	    def __init__(self, output: dict):
	        self.output = output

	    def predict(self, context, model_input):
	        return [self.output[model_input[i]] for i in range(len(model_input))]

	class BonusCase(unittest.TestCase):

	    def setUp(self):
	        self.model_path = os.sep.join((tempfile.gettempdir(), "mlflow_test"))
	        if os.path.isdir(self.model_path):
	            shutil.rmtree(self.model_path)

	    def test_model(self):
	        output = {4: 100, 3: 200}
	        mlflow.pyfunc.save_model(self.model_path, python_model=Model(output))
	        bonus = Bonus(self.model_path)
	        self.assertEqual(bonus.amount(list(output.keys())), list(output.values()))

	if __name__ == '__main__':
	    unittest.main()

#programming #mlflow #machine-learning #data-science #unit-testing #testing

Unit Testing MLflow Model Dependent Business Logic
2.70 GEEK