Serverless Data Pipelines Made Easy with Prefect and AWS ECS Fargate

The easiest way to orchestrate your Python data pipelines

Even though there are so many workflow orchestration solutions and cloud services for building data workloads, it’s hard to find one which is actually pleasant to use and allows you to get started quickly. One of my favorite tools for building data pipelines in Python is Prefect — a workflow management platform with a hybrid agent-based execution model.

What does a hybrid execution model entail? It means that even if you use the cloud orchestration platform (Prefect Cloud), you still own and manage your agents. In fact, Prefect has no direct access to your code or data. Instead, it only operates on metadata that you send to their API when registering agents and flows. This means, for one, that this platform can satisfy the stringent security & compliance requirements as the entire workflow execution happens within compute resources of your choice. On the other hand, it allows for an incredible amount of flexibility. Your agent could be a Kubernetes cluster, an ECS Fargate cluster on AWS, any compute server on-prem or in the cloud, or a mix of all of them. Even your laptop can be registered as a Prefect agent.

Although the hybrid execution model has lots of benefits, it may be challenging to configure your execution layer properly. In this article, we’ll look at how to set up Prefect with an AWS ECS Fargate agent and S3 storage, which allows for a fully-serverless NoOps execution environment.

Table of contents

·  Getting started with Prefect Cloud

·  Prefect Cloud setup

∘  Create your Prefect account

∘  Install Prefect

∘  Create a personal access token to authenticate with Prefect Cloud

·  Creating AWS resources

∘  Creating an IAM role for our ECS tasks

∘  Creating an S3 bucket to store our flow code

∘  Creating an EC2 instance for our agent process

∘  SSH to the EC2 instance to configure the ECS Prefect agent

∘  Creating a Fargate cluster

·  Deploying our first serverless flow

∘  ECSRun configuration & Prefect Storage

∘  Registering a flow for ECS agent

·  Conclusion

