Getting started with TensorFlow Serving

TensorFlow Serving is a part of TensorFlow Extended(TFX) that makes deploying your machine learning model to a server more comfortable than ever. Before Google released TensorFlow Serving, your model has to be deployed into production using Docker. Using Docker to deploy your model is tedious, time-consuming, and prone to many errors. TensorFlow Serving provides us with an API that can be called upon using HTTP requests to run inference on the server. In this blog, we will serve an emotion recognition model and, through that, understand the basics of TensorFlow Serving.

Why serve a model?

Once you have trained your model, it has to be deployed into production so that it can be used. Various methods can be used to deploy the model like deploying locally on phones using TFlite, deploying on a website using TFjs, creating a docker container to deploy your model on the cloud, etc. TensorFlow Serving has an advantage over the other methods for the following reasons.

It is much easier to deploy your model using TensorFlow Serving than with Docker, and it saves you time and prevents unnecessary errors.
It is easier to manage different versions of the model as compared to TFlite or TFjs.
When the model is updated, all the clients will be using the same version of the model, and the result will thus be uniform.
Since the model will be running on the server, you can use powerful computational resources like GPUs or TPUs to run inference faster.
Since the model is served an API, it can be used by different programming languages that TensorFlow does not support.

#artificial-intelligence #deep-learning #tensorflow #machine-learning

towardsdatascience.com

Getting started with TensorFlow Serving