AWS Step Functions is a fully managed service designed to coordinate and chain a series of steps together to create something called a state machine for automation tasks. It supports visual workflows and state machines are defined as JSON structure via Amazon State Language ( ASL). In addition, state machines can be scheduled via Amazon CloudWatch as an event rule cron expression.
In this blog, I will walk you through 1.) how to orchestrate data processing jobs via Amazon EMR and 2.) how to apply batch transform on a trained machine learning model to write predictions via Amazon SageMaker. Step Functions can be integrated with a wide variety of AWS services including: AWS Lambda, AWS Fargate, AWS Batch, AWS Glue, Amazon ECS, Amazon SQS, Amazon SNS, Amazon DynamoDB, and more.
1a.) Let’s view our input sample dataset (dummy data from my favorite video game) in Amazon S3.
Image by Author
1b.) Next, I will create a state machine that spins up an EMR cluster (group of EC2 instances) via ASL.
"Create_Infra": {
"Type": "Task",
"Resource": "arn:<partition>:states:<region>:<account-id>:elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "Demo",
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.29.0",
"Applications": [
{
"Name": "Hadoop"
},
{
"Name": "Spark"
},
{
"Name": "Hive"
},
{
"Name": "Sqoop"
}
],
"ServiceRole": "EMR_DefaultRole",
"JobFlowRole": "EMR_EC2_DefaultRole",
"LogUri": "s3://aws-logs-<account-id>-<region>/elasticmapreduce/",
"Instances": {
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceGroups": [
{
"Name": "Master Instance Group",
"InstanceRole": "MASTER",
"InstanceCount": 1,
"InstanceType": "m5.xlarge",
"Market": "ON_DEMAND"
},
{
"Name": "Core Instance Group",
"InstanceRole": "CORE",
"InstanceCount": 1,
"InstanceType": "m5.xlarge",
"Market": "ON_DEMAND"
},
{
"Name": "Task Instance Group",
"InstanceRole": "TASK",
"InstanceCount": 2,
"InstanceType": "m5.xlarge",
"Market": "ON_DEMAND"
}
],
"Ec2KeyName": "<ec2-key>",
"Ec2SubnetId": "<subnet>",
"EmrManagedMasterSecurityGroup": "<security-group>",
"EmrManagedSlaveSecurityGroup": "<security-group>",
"ServiceAccessSecurityGroup": "<security-group>"
}
},
"ResultPath": "$.cluster",
"Next": "Example_Job_Step_1"
}
#emr #amazon-sagemaker #step-functions #machine-learning #aws