AWS recently announced “Amazon RDS Snapshot Export to S3” feature wherein you can now export Amazon Relational Database Service (Amazon RDS) or Amazon Aurora snapshots to Amazon S3 as Apache Parquet, an efficient open columnar storage format for analytics.

I had a use-case to refresh Athena tables daily with full data set in Account B(us-east-1) from Aurora MySQL database running under Private subnet in Account A (us-west-2). The two solutions I could think of was -

  1. Have EC2 instance running in Public subnet to act as bridge to the Aurora instance and configure SSH Tunnel to pull the data to S3 using python script.
  2. Use the newly released RDS snapshot export to S3 feature creating a server-less solution.

I used this new feature with cross-region replication enabled for S3 bucket to replicate the data to S3 bucket in Account B.

Image for post

In this post, I will go through the steps to have the data into staging bucket of Account B and few issues I faced during this setup -

1. Setup Cross-Region replication between the Source S3 bucket in Account A(us-west-2) and Destination Bucket in Account B(us-east-1).

I created a new S3 Bucket and navigated to Replication under Management tab. For both the source and destination S3 buckets versioning needs to be enabled. If you want to replicate objects encrypted with AWS KMS make sure to enable the check box under Replication criteria.

#big-data #rds #aws-rds-snapshot #aws #data-engineering

Usecase with RDS Snapshot Export to S3
5.90 GEEK