Introduction

In this article, we will show you how to create a serverless solution for implementing a scalable Optical Character Recognition (OCR) system. In a system like this, scalability is a requirement. At certain times, we can expect possible bursts of traffic into the system where we need to process all of these requests and communicate the result back to the user in a timely manner. To cater to this, we need a system that scales dynamically. One possible solution is to model the required workers and deploy them in a Kubernetes environment to achieve our scaling requirements. This approach has been implemented and discussed in this article.

Here, we will implement the same solution using Azure Functions in Ballerina (referred to as Ballerinalang in the rest of the article) and show how it can be implemented with considerably fewer lines of code, which resulted in lesser complexity and better maintainability.

Architecture

Figure 1: Deployment Diagram

Figure 1: Deployment Diagram

Figure 1 shows the deployment diagram of the solution that we will be implementing. The user input is taken using an HTTP endpoint, where the user will be providing the binary data for the image and an email address as a query parameter. This HTTP endpoint will be implemented using an HTTP trigger in Azure Functions, and from here, using their output binding mechanism, we store the image data and the job request information in blob storage and a queue storage respectively. The reason why we are going towards an asynchronous processing approach is that, in this manner, it is easier to scale the required processing units as needed. For example, the job submission function is not a CPU bound task, but rather it does a simple data storage operation. Whereas, the image processing function, which reads in from the blob and queue storage, will have a more expensive and time-consuming task of doing the actual OCR operations. So the serverless environment can scale the functions with its requirements.

In the same manner, the result publishing function is separated from other tasks, since its email sending task can be a task with a high latency, which shouldn’t be an operation that should be blocking others. So it has its own result queue to retrieve result entries to be sent out using its capacity.

Implementation

Here, we will take a look at the Ballerinalang code that was used when implementing the Azure Functions solution.

Job Submission

Listing 1: submitJob Function Implementation

Listing 1: submitJob Function Implementation

The submitJob function is the entry point to the system, where it defines an HTTP trigger to collect the user’s email address and take in the image data. Also, it defines blob and queue output binding to save the data that is collected. For the next function, it is just a matter of connecting its input bindings to the output binding that is defined here.

#microservices #cloud computing #serverless #scalability #asynchronous #ballerina

Practical Serverless: A Scalable OCR Solution in 10 Minutes
2.90 GEEK