In this blog, I will walk you through 1.) how to orchestrate data processing jobs via Amazon EMR and 2.) how to apply batch transform on a trained machine learning model to write predictions via Amazon SageMaker.
Amazon Web Services provide two service options capable of performing ETL: Glue and Elastic MapReduce (EMR). If they both do a similar job, why would you choose one over the other? This article details some fundamental differences between the two.
This post gives you a quick walkthrough on AWS Lambda Functions and running Apache Spark in the EMR cluster through the Lambda function. It also explains how to trigger the function using other Amazon Services like S3.
An overview on how to process data in spark using DataBricks, add the script as a step in AWS EMR and output the data to Amazon Redshift. This article is part of the series and continuation of the previous post. In the previous post, we saw how we can stream the data using Kinesis Firehose either using stackapi or using Kinesis Data Generator. In this post, let’s see how we can decide on the key processing steps that need to be performed before we send the data to any analytics tool.
A Guide to completely automate data processing pipelines using S3 Event Notifications, AWS Lambda and Amazon EMR. Progression is continuous.