Orchestrating Transient Data Analytics Workflows via AWS Step Functions

In this blog, I will walk you through 1.) how to orchestrate data processing jobs via Amazon EMR and 2.) how to apply batch transform on a trained machine learning model to write predictions via Amazon SageMaker.

AWS Glue Vs. EMR: Differentiating two of the best ETL platforms

Amazon Web Services provide two service options capable of performing ETL: Glue and Elastic MapReduce (EMR). If they both do a similar job, why would you choose one over the other? This article details some fundamental differences between the two.

Running Spark Application in the EMR Cluster Through AWS Lambda Function

This post gives you a quick walkthrough on AWS Lambda Functions and running Apache Spark in the EMR cluster through the Lambda function. It also explains how to trigger the function using other Amazon Services like S3.

Data Processing Stack Overflow Data Using Apache Spark on AWS EMR

An overview on how to process data in spark using DataBricks, add the script as a step in AWS EMR and output the data to Amazon Redshift. This article is part of the series and continuation of the previous post. In the previous post, we saw how we can stream the data using Kinesis Firehose either using stackapi or using Kinesis Data Generator. In this post, let’s see how we can decide on the key processing steps that need to be performed before we send the data to any analytics tool.

Fully Automated, Low Cost Data Pipelines

A Guide to completely automate data processing pipelines using S3 Event Notifications, AWS Lambda and Amazon EMR. Progression is continuous.