Orchestrating ETL pipelines on AWS with Glue, StepFunctions and Cloudformation

Big Data analytics is becoming increasingly important to draft major business choices in corporations of all sizes. However collecting, aggregating, joining, and analyzing (wrangling) huge amounts of data stored in different locations with a heterogeneous structure (e.g. databases, CRMs, unstructured text, etc.) is often a daunting and very time-consuming task.

Cloud computing often comes to the rescue, by providing cheap and scalable storage computing and data lake solutions, and in particular, AWS is the pack leader with the very versatile Glue and S3 services which allow users to ingest transform, and normalize store datasets of all sizes. Furthermore, Glue Catalog and Athena allow users to easily run Presto-based SQL queries on the normalized data in S3 data lakes, whose results can easily be stored and analyzed in business intelligence tools such as QuickSight.

#aws-step-functions #aws-cloudformation #etl #aws-glue #aws

What is GEEK

Buddha Community

Orchestrating ETL pipelines on AWS with Glue, StepFunctions and Cloudformation

Orchestrating ETL pipelines on AWS with Glue, StepFunctions and Cloudformation

Big Data analytics is becoming increasingly important to draft major business choices in corporations of all sizes. However collecting, aggregating, joining, and analyzing (wrangling) huge amounts of data stored in different locations with a heterogeneous structure (e.g. databases, CRMs, unstructured text, etc.) is often a daunting and very time-consuming task.

Cloud computing often comes to the rescue, by providing cheap and scalable storage computing and data lake solutions, and in particular, AWS is the pack leader with the very versatile Glue and S3 services which allow users to ingest transform, and normalize store datasets of all sizes. Furthermore, Glue Catalog and Athena allow users to easily run Presto-based SQL queries on the normalized data in S3 data lakes, whose results can easily be stored and analyzed in business intelligence tools such as QuickSight.

#aws-step-functions #aws-cloudformation #etl #aws-glue #aws

ETL Data Pipeline In AWS

ETL (Extract, Transform, and Load) is an emerging topic among all the IT Industries. Industries often looking for some easy solution and Open source tools and technology to do ETL on their valuable data without spending much effort on other things.

There is AWS Glue for you, it’s a feature of Amazon Web Services to create a simple ETL pipeline.

AWS Glue Introduction

AWS Glue is another offering from AWS and is a serverless ETL (Extract, Transform, and Load) service on the cloud. It is fully managed, cost-effective service to categorize your data, clean and enrich it and finally move it from source systems to target systems.

#etl #aws #aws-glue #etl

Rory  West

Rory West

1619214480

ETL Orchestration on AWS with AWS Step Functions

In the latest years, the engineering, governance, and analysis of data has become a very common talking point.

The need for data-driven decision-making, in fact, has grown the need of collecting and analyzing data in many ways and AWS has shown a particular interest in this field developing multiple tools for achieving these business goals.

Before being able to allow the figure of the data analyst to explore and visualize the data, a crucial step is needed. This procedure is commonly identified as ETL (extract, transform, and load) and, usually, it’s far from being simple.

#aws-step-functions #aws #etl #aws-lambda

Rory  West

Rory West

1623243120

AWS CloudFormation Template Basics

Have you ever tried to move resources from one AWS region to another? It can be quite painful. You have to figure out how all of the resources connect together, then plan out what order you need to recreate them. Fortunately, AWS has a simpler way of doing that. It’s called CloudFormation.

CloudFormation allows you to define all of those resources (and their relationships) in a JSON or YAML file called a template. The template can take in some parameters too, which means you can define multiple environments with a single template.

In this article, I’ll explain the fundamental sections of a CloudFormation template and how to use it to deploy a stack.

CloudFormation Template Structure

Cloud formation templates are YAML files with a few specific root properties that are referred to as sections. If you want to see the sections not covered in this article, checkout out the CloudFormation User Guide.

Parameters

The parameters section allows you to create parameters (duh). Using parameters allows you to create a single template that can be reused across multiple environments. Just change the parameter values and you have a new environment–or at least an updated one.

#cloudformation #aws #aws-s3 #aws cloudformation

Evil  David

Evil David

1600579483

How To Use AWS CloudFormation To Reduce Redundancy

Mr. X has been working on a very interesting, and meticulously written web application involving some heavy computation and a complete test suite. He architects the system well and chooses to host his application over AWS.

He uses AWS RDS for his database, AWS EC2 to serve his application and AWS Lambda to do the heavy calculations. He also Dockerized his entire application in order to seamlessly build and deploy his work at his will.

Everything was set, the app was working amazingly, so he scheduled a demo with a potential client. He was making some last-minute changes and everything was still working well. So, right before the demo, it was time for deployment.

Mr. X deployed the Lambda application quickly and proceeded to rebuild his Docker images. He pulled the code on his EC2 machine but forgot to rebuild his Docker images. Instead, on the spur of the moment, he simply restarted his containers and proceeded for the demo. It takes no guesswork to learn that the demo didn’t go well.

Every other developer has a similar story to share at some point in time. But could this be avoided? The answer is yes! Could deployment be automated across all your AWS services at once so that your whole application gets deployed in one go? Again, yes!

AWS CloudFormation makes it possible.

#aws #cloudformation #aws-cloudformation #docker #aws-ecs