As most ML practitioners realize, developing a predictive model in Jupyter Notebook and making the predictions with excel data may not help you build the predictive models required at enterprise scale. To build the model at such a scale, you will need to consider several requirements and use various tools/frameworks that are especially designed to meet the purpose of this expansion.

Most of the tutorials online speak about the productionization of ML models as exposing them as a REST service with the help of Apache Flask. But in reality, the requirements are much more steep, and in this article I will be explaining the key challenges to consider and then will provide you with a containerized, enterprise scale, ‘fully loaded’ ML jumpstart kit that you can readily deploy towards your model productionization purposes.

Prerequisites:

In this article, I will explain in detail the key challenges to consider while productionizing prediction models and will show you how to setup an environment with docker containers. So, I assume the readers of this article are familiar with docker and its commands.

Also, you will be required to have docker installed in your system.

The entire jumpstart kit with the code is available here and you can refer to it as needed.

Now, let’s get started!

Challenges to Consider and How We are Addressing Them

Challenge 1 — Production Grade App Server

First thing’s first, Apache Flask is not a web server/app server and it is not intended for production usage on its own, according to official documents. So, exposing a model as a REST service with Flask may not be the right choice. If you want to use Apache Flask, we need to use it along with Gunicorn and NGINX as a WSGI app server and web server, respectively, to meet the production needs.

How are we addressing this in our jumpstart kit?

A new Python micro-framework, FastAPI, has recently launched and claims that it is more 300% faster than Flask. Apart from performance, it has several advantages such as it’s easy to learn (very similar to Flask), standard-based, robust, and has asynchronous execution, etc. Refer to this blog for additional details on the benefits of FastAPI over Flask:

Why we switched from Flask to FastAPI for production machine learning

The most popular tool isn’t always the best

towardsdatascience.com

FastAPI inherently works with Uvicorn — an ASGI based app server — and supports an asynchronous mode of execution, which is ideal for long-running ML based tasks.

Challenge 2 — Loosely Coupled Execution

Typical machine learning processes are long-running tasks such as model training, data processing, prediction (in some cases), etc., and exposing them directly as a web service may not help you in scaling up. Consider you have exposed your model as a service and during the peak time of your business, hundreds of calls are being made to your service. In this case, your server may not be equipped to handle all the requests and ultimately your server could end up crashing out.

A better way to handle this scenario is by using a ‘message queue’ to queue the incoming requests and address them in an asynchronous way with loosely coupled worker processes.

As shown below, every call to the REST API will be queued in a message queue and the consumer consumes the messages one-by-one to execute the task. We can increase the consumers as needed and the system will not be overloaded at any point.

#fastai #productionizing-ml #docker #rabbitmq #heartbeat

Enterprise Scale ML Jumpstart Kit — FastAI + RabbitMQ + Docker
1.20 GEEK