Serverless BigData Pipeline implementation

Recently, I came across the AWS India Summit 2016 summary, where Purplle showcased their model of implementation using Serverless architecture. Quite surprisingly it was handled by one-man team and done with such efficiency that I decided to explore the architecture and how they implemented it in their organization.

Image source: www.iamwire.com

As what Big Data is known for the same challenges were faced by purplle.com team in implementing the pipeline. The challenges faced by team were:

Variety
Velocity
Veracity

Following are the definitions as per the general data pipeline architecture:

Collectors/Routers: They help to handle massive influx of data through streams like click-streams and ad impressions.
Data lake: It is a data lake which is redundant and durable is able to handle I/O at high volumes and is available all the time.
Data warehouse: It is flexible warehouse which allows experimentation with data modelling and allows continuous ingestion of raw data from data lake.
Hot data tier (NoSQL/Cache): It can quickly read and write for unit and batch and has the ability to perform at uneven traffic flow.

Same architecture was implemented using AWS Lambda.

#serverless #bigdata

dashbird.io

Serverless BigData Pipeline implementation