Orchestrating Transient Data Analytics Workflows via AWS Step Functions

Orchestrating Transient Data Analytics Workflows via AWS Step Functions

In this blog, I will walk you through 1.) how to orchestrate data processing jobs via Amazon EMR and 2.) how to apply batch transform on a trained machine learning model to write predictions via Amazon SageMaker.

AWS Step Functions is a fully managed service designed to coordinate and chain a series of steps together to create something called a state machine for automation tasks. It supports visual workflows and state machines are defined as JSON structure via Amazon State Language ( ASL). In addition, state machines can be scheduled via Amazon CloudWatch as an event rule cron expression.

In this blog, I will walk you through 1.) how to orchestrate data processing jobs via Amazon EMR and 2.) how to apply batch transform on a trained machine learning model to write predictions via Amazon SageMakerStep Functions can be integrated with a wide variety of AWS services including: AWS LambdaAWS FargateAWS BatchAWS GlueAmazon ECSAmazon SQSAmazon SNSAmazon DynamoDB, and more.

Example 1: Orchestrate Data Processing Jobs via Amazon EMR

1a.) Let’s view our input sample dataset (dummy data from my favorite video game) in Amazon S3.

Image for post

Image by Author

1b.) Next, I will create a state machine that spins up an EMR cluster (group of EC2 instances) via ASL.

    "Create_Infra": {
          "Type": "Task",
          "Resource": "arn:<partition>:states:<region>:<account-id>:elasticmapreduce:createCluster.sync",
          "Parameters": {
            "Name": "Demo",
            "VisibleToAllUsers": true,
            "ReleaseLabel": "emr-5.29.0",
            "Applications": [
              {
                "Name": "Hadoop"
              },
              {
                "Name": "Spark"
              },
              {
                "Name": "Hive"
              },
              {
                "Name": "Sqoop"
              }
            ],
            "ServiceRole": "EMR_DefaultRole",
            "JobFlowRole": "EMR_EC2_DefaultRole",
            "LogUri": "s3://aws-logs-<account-id>-<region>/elasticmapreduce/",
            "Instances": {
              "KeepJobFlowAliveWhenNoSteps": true,
              "InstanceGroups": [
                {
                  "Name": "Master Instance Group",
                  "InstanceRole": "MASTER",
                  "InstanceCount": 1,
                  "InstanceType": "m5.xlarge",
                  "Market": "ON_DEMAND"
                },
                {
                  "Name": "Core Instance Group",
                  "InstanceRole": "CORE",
                  "InstanceCount": 1,
                  "InstanceType": "m5.xlarge",
                  "Market": "ON_DEMAND"
                },
                {
                  "Name": "Task Instance Group",
                  "InstanceRole": "TASK",
                  "InstanceCount": 2,
                  "InstanceType": "m5.xlarge",
                  "Market": "ON_DEMAND"
                }
              ],
              "Ec2KeyName": "<ec2-key>",
              "Ec2SubnetId": "<subnet>",
              "EmrManagedMasterSecurityGroup": "<security-group>",
              "EmrManagedSlaveSecurityGroup": "<security-group>",
              "ServiceAccessSecurityGroup": "<security-group>"
            }
          },
          "ResultPath": "$.cluster",
          "Next": "Example_Job_Step_1"
        }

emr amazon-sagemaker step-functions machine-learning aws

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Build Machine Learning Models Using AWS Sagemaker | AWS Machine Learning

Build ML Models With Amazon SageMaker will introduce you to two very popular domains that are Machine Learning and Cloud Computing. What is Amazon Web Services? What is Machine Learning? Machine With AWS? What are the AWS Machine Learning Services? What is Amazon Sagemaker? How to build an Ml Model with Amazon SageMaker?"

Acing the AWS Certified Machine Learning Specialty Exam like a Pro

In this article, I would like to share my experience of passing the AWS machine learning specialty certification exam. The objective of this post is to help folks who would like to pursue their career in the field of data science or machine learning and want to showcase their interests.

Setup —Serverless Machine Learning Inference with AWS Lambda + Amazon EFS

A step-by-step tutorial to set up ML inferences with AWS Lambda using its newly released integration with Amazon Elastic File System.

AWS Machine Learning Tutorial - Amazon Machine Learning

AWS Machine Learning Tutorial - Amazon Machine Learning will introduce you to the nitty-gritty of Cloud Computing, Machine Learning and help you build an ML model using AWS.

What is Supervised Machine Learning

What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI