Getting started with TensorFlow Serving

TensorFlow Serving is a part of TensorFlow Extended(TFX) that makes deploying your machine learning model to a server more comfortable than ever. Before Google released TensorFlow Serving, your model has to be deployed into production using Docker. Using Docker to deploy your model is tedious, time-consuming, and prone to many errors. TensorFlow Serving provides us with an API that can be called upon using HTTP requests to run inference on the server. In this blog, we will serve an emotion recognition model and, through that, understand the basics of TensorFlow Serving.

Why serve a model?

Once you have trained your model, it has to be deployed into production so that it can be used. Various methods can be used to deploy the model like deploying locally on phones using TFlite, deploying on a website using TFjs, creating a docker container to deploy your model on the cloud, etc. TensorFlow Serving has an advantage over the other methods for the following reasons.

  1. It is much easier to deploy your model using TensorFlow Serving than with Docker, and it saves you time and prevents unnecessary errors.
  2. It is easier to manage different versions of the model as compared to TFlite or TFjs.
  3. When the model is updated, all the clients will be using the same version of the model, and the result will thus be uniform.
  4. Since the model will be running on the server, you can use powerful computational resources like GPUs or TPUs to run inference faster.
  5. Since the model is served an API, it can be used by different programming languages that TensorFlow does not support.

Getting started with Tensorflow

Learn the basics through examples


Ok, let’s discuss the elephant in the room. Should you learn Tensorflow or PyTorch?

Honestly, there is no right answer. Both platforms have a large open source community behind them, are easy to use, and capable of building complex deep learning solutions. If you really want to shine as a deep learning researcher you will have to know both.

Let’s now discuss how Tensorflow came about and how to use it for deep learning.

Serving TensorFlow models with TensorFlow Serving

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

📖 Introduction

Currently there are a lot of different solutions to serve ML models in production with the growth that **MLOps **is having nowadays as the standard procedure to work with ML models during all their lifecycle. Maybe the most popular one is TensorFlow Serving developed by TensorFlow so as to server their models in production environments.

This post is a guide on how to train, save, serve and use TensorFlow ML models in production environments. Along the GitHub repository linked to this post we will prepare and train a custom CNN model for image classification of The Simpsons Characters Data dataset, that will be later deployed using TensorFlow Serving.

So as to get a better understanding on all the process that is presented in this post, as a personal recommendation, you should read it while you check the resources available in the repository, as well as trying to reproduce it with the same or with a different TensorFlow model, as “practice makes the master”.


How to Serve Different Model Versions using TensorFlow Serving

This article explains how to manage multiple models and multiple versions of the same model in TensorFlow Serving using configuration files along with a brief understanding of batching.

Image for post

You have TensorFlow deep learning models with different architectures or have trained your models with different hyperparameters and would like to test them locally or in production. The easiest way is to serve the models using a Model Server Config file.

A Model Server Configuration file is a protocol buffer file(protobuf), which is a language-neutral, platform-neutral extensible yet simple and faster way to serialize the structure data.

Deployment of a TensorFlow model to Production using TensorFlow Serving

Learn step by step deployment of a TensorFlow model to Production using TensorFlow Serving.

You created a deep learning model using Tensorflow, fine-tuned the model for better accuracy and precision, and now want to deploy your model to production for users to use it to make predictions.

TensorFlow Serving allows you to

  • Easily manage multiple versions of your model, like an experimental or stable version.
  • Keep your server architecture and APIs the same
  • Dynamically discovers a new version of the TensorFlow flow model and serves it using (remote procedure protocol) using a consistent API structure
  • Consistent experience for all clients making inferences by centralizing the location of the model

The key components of TF Serving are

  • Servables: A Servable is an underlying object used by clients to perform computation or inference**. TensorFlow serving represents the deep learning models as one ore more Servables.
  • LoadersManage the lifecycle of the Servables as Servables cannot manage their own lifecycle. Loaders standardize the APIs for loading and unloading the Servables, independent of the specific learning algorithm.
  • Source: Finds and provides Servables and then supplies one Loader instance for each version of the servable.
  • Managers: Manage the full lifecycle of the servable: Loading the servable, Serving the servable, and Unloading the servable
  • TensorFlow Core: Manages lifecycle and metrics of the Servable by making the Loader and servable as opaque objects

