Introduction

“Serverless” and “BERT” are two topics that strongly influenced the world of computing. Serverless architecture allows us to provide dynamically scale-in and -out the software without managing and provisioning computing power. It allows us, developers, to focus on our applications.

BERT is probably the most known NLP model out there. You can say it changed the way we work with textual data and what we can learn from it. “BERT will help [Google] Search [achieve a] better understand[ing] one in 10 searches”. BERT and its fellow friends RoBERTa, GPT-2, ALBERT, and T5 will drive business and business ideas in the next few years and will change/disrupt business areas like the internet once did.

Imagine the business value you achieve combining these two together. But BERT is not the easiest machine learning model to be deployed in a serverless architecture. BERT is quite big and needs quite some computing power. Most tutorials you find online demonstrate how to deploy BERT in “easy” environments like a VM with 16GB of memory and 4 CPUs.

I will show you how to leverage the benefits of serverless architectures and deploy a BERT Question-Answering API in a serverless environment. We are going to use the Transformers library by HuggingFace, the Serverless Framework, and AWS Lambda.

Transformer Library by Huggingface

The Transformers library provides state-of-the-art machine learning architectures like BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5 for Natural Language Understanding (NLU) and Natural Language Generation (NLG). It also provides thousands of pre-trained models in 100+ different languages and is deeply interoperability between PyTorch & TensorFlow 2.0. It enables developers to fine-tune machine learning models for different NLP-tasks like text classification, sentiment analysis, question-answering, or text generation.

AWS Lambda

AWS Lambda is a serverless computing service that lets you run code without managing servers. It executes your code only when required and scales automatically, from a few requests per day to thousands per second. You only pay for the compute time you consume — there is no charge when your code is not running.

#aws #machine-learning #deep-learning #bert #serverless

Serverless BERT with HuggingFace and AWS Lambda
4.20 GEEK