Big Data analytics is becoming increasingly important to draft major business choices in corporations of all sizes. However collecting, aggregating, joining, and analyzing (wrangling) huge amounts of data stored in different locations with a heterogeneous structure (e.g. databases, CRMs, unstructured text, etc.) is often a daunting and very time-consuming task.

Cloud computing often comes to the rescue, by providing cheap and scalable storage computing and data lake solutions, and in particular, AWS is the pack leader with the very versatile Glue and S3 services which allow users to ingest transform, and normalize store datasets of all sizes. Furthermore, Glue Catalog and Athena allow users to easily run Presto-based SQL queries on the normalized data in S3 data lakes, whose results can easily be stored and analyzed in business intelligence tools such as QuickSight.

#aws-step-functions #aws-cloudformation #etl #aws-glue #aws

Orchestrating ETL pipelines on AWS with Glue, StepFunctions and Cloudformation
1.40 GEEK