Serverless GPU-Powered Hosting of Machine Learning Models

Serverless GPU-Powered Hosting of Machine Learning Models

Serverless GPU-Powered Hosting of Machine Learning Models. Algorithmia is the only truly serverless platform for serving models with GPU support, where you only pay for the actual compute time. You have a GPUs available and it is actually a compute power! Algorithmia is a super easy-to-use Serverless Machine Learning Model.

A working example with Algorithmia

Motivation

With the rise of MLOps in recent years, running machine learning models for inference tasks has become much easier. Depending on the use case, appropriately optimized deep learning models can even run directly on a mobile device. In client-server/microservice architectures, larger models with high accuracy requirements are usually hosted centrally and queried by downstream services via well-defined interfaces. Tools such as TensorFlow Serving now also make these use cases a manageable problem on an appropriately configured server infrastructure.

However, from a software engineering perspective, we know how complex a self-managed infrastructure can become. Not surprisingly, serverless solutions from cloud providers are gaining in popularity for application development these days. No infrastructure management and pay as you go are the main advantages, which is why I now work almost exclusively with such solutions.

Serverless GPU in 2021

However, when I found myself in the situation of integrating a rather complex deep learning model for online prediction into such a serverless architecture of microservices, I was somewhat surprised. In my use case, the requirement was to process individual requests with base64 encoded images at irregular intervals (a few seconds to several hours) and return the correct class using a self-trained deep learning model. In my opinion, a standard task without deep complexity. Spoiled by Cloud Run, Cloud Functions, AWS Lambda, etc., I naively thought there should be a “GPU enable” checkbox and off we go…

Not quite. In fact, finding a truly serverless solution turned out to be non-trivial. As already described here, the classic serverless solutions are rather designed for CPU workload. Inferencing with CPUs only was out of the question in my case, since the latency requirement of the service could not have been met in this way.

Google AI Platform Prediction and AWS SageMaker?

Meanwhile, Google with AI Platform Prediction and AWS with SageMaker offer solutions including inference accelerators for deep learning models. Just a brief summary of why these services did not meet my requirements (for now).

Starting with AWS SageMaker the minimum instance count is required to be 1 or higher ( https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html). For many use cases with continuous load this should not be a problem. For my case, however, this would be a waste of resources and is not 100% in line with the pay as you go principle I was looking for.

Google AI Platform Prediction currently only allows the use of GPUs with the TensorFlow SavedModel format. For example, PyTorch models can only be used within custom containers (currently pre-GA) without GPU support. In addition, google does allow autoscaling to 0, but if a request triggers your service you are charged for a minimum of 10 minutes computing time even if the request took only a fraction of a second ( https://cloud.google.com/ai-platform/prediction/pricing)

inference serverless gpu mlops

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Serverless Applications - Pros and Cons to Help Businesses Decide - Prismetric

Businesses need to understand serverless application with major pros and cons of serverless architecture, before deciding about serverless computing.

Overcoming Common Serverless Challenges with Mainframe CICS Programs

Bypass the complex middleware and consider a lightweight node.js implementation to deploy serverless functions from your mainframe CICS applications.

Optimizing the price-performance ratio of a Serverless Inference Service with Amazon

Optimizing the price-performance ratio of a Serverless Inference Service with Amazon SageMaker. I have shown how you can use Amazon SageMaker to optimize the price-performance ratio of a Serverless Inference Service. Finding optimal settings for inference with AWS Lambda using SageMaker Hyperparameter Tuning jobs and Locust.

Top 5 Serverless Trends in 2020

Happy Serverless September 2020! We at Coding Sans love working with serverless technology. This is why we decided to publish a report with the latest serverless trends this year. We partnered up with nine other companies who share our love to make it happen.

Deploying a Serverless Inference Service with Amazon SageMaker Pipelines

Deploying a serverless inference service with Amazon SageMaker Pipelines, AWS Lambda, Amazon API Gateway, and CDK. Step-by-step guide to serverless model deployments with SageMaker.