Recently, I came across the AWS India Summit 2016 summary, where Purplle showcased their model of implementation using Serverless architecture. Quite surprisingly it was handled by one-man team and done with such efficiency that I decided to explore the architecture and how they implemented it in their organization.

 Image source: www.iamwire.com


As what Big Data is known for the same challenges were faced by purplle.com team in implementing the pipeline. The challenges faced by team were:

  1. Variety
  2. Velocity
  3. Veracity

Following are the definitions as per the general data pipeline architecture:

  • Collectors/Routers: They help to handle massive influx of data through streams like click-streams and ad impressions.
  • Data lake: It is a data lake which is redundant and durable is able to handle I/O at high volumes and is available all the time.
  • Data warehouse: It is flexible warehouse which allows experimentation with data modelling and allows continuous ingestion of raw data from data lake.
  • Hot data tier (NoSQL/Cache): It can quickly read and write for unit and batch and has the ability to perform at uneven traffic flow.

Same architecture was implemented using AWS Lambda.

#serverless #bigdata

Serverless BigData Pipeline implementation
1.20 GEEK