Analyzing Data Using Serverless

A case study on how data is analyzed with Serverless technology

The word serverless starts to become a hot topic in the world of Computer Programming. Maybe you heard the word Serverless a couple of times, either by going to conferences or by talking with other people.

What will we learn today?

When to use serverless functions
How to create a data processing pipeline
How to use Google Cloud technologies in order to process data

We took the decision of using Google as our cloud provider, although everything that is presented in this article can be achieved using other cloud providers like Amazon, Azure etc.

What we are going to build

In this article we will see how we can take advantage of serverless functions in order to build a Processing Data Pipeline for analyzing and processing data.

Let’s imagine that we are working at an IT Company and every couple of weeks we receive files that contain information about issues (tasks) from our projects. Our managers look from time to time into our application where they want to see statistics from all projects.

The project managers look every month to see what is the status of the projects from the company, like seeing the number of issues that were done in total from when the project was started and the number of story points done on that project. Sometimes they also want to see all the issues that were not of type bugs and were finished when the file was received.

In order to fulfill their needs, we are going to build a pipeline that filters and aggregates the data they are interested in.

Why are serverless technologies good in this case?

Single event that starts our processing pipeline
Server not running 24/7
Small functions with a single purpose
Paying only while running

The pipeline:

Upload the file into the application
Upload the data into a data warehouse
Filter the data we uploaded and put that into another table
Aggregate the data and update the statistics

In this article we will see how we can implement a processing data pipeline using Google technologies. The same concept applies to any Cloud Provider that has Serverless technologies.

Technologies stack:

Google Cloud Functions - Serverless functions used to process the data
BigQuery - data warehouse
NodeJs 8 - as our programming language

We are going to present the technologies we are going to use and then see how we can build this pipeline.

#serverless #cloud #developer

What will we learn today?

What we are going to build

The pipeline:

Technologies stack:

dashbird.io

Analyzing Data Using Serverless