Recently someone in my organisation asked me about ways of backing up our performance metrics data. While having backup tables is one of the standard ways of doing it, I, super paranoid about the safety and resilience of our database started thinking about other alternatives alongside a database backup that would not shrink our wallets dry and would also provide additional features in addition to the required safety and resiliency.

Walking down that lane, two solutions came to my mind. One was having offline backups on our Microsoft Onedrive storage which was provided to us for backing up our local files and second was S3. Onedrive folder has its own advantages when we talk about collaborative efforts with awesome integration with other office applications and online editing capabilities for multiple office applications like Excel and PowerPoint. S3 on the other hands kills it in terms of providing seamless support towards serverless architectures, support for all kinds of file formats, in-built querying capabilities for CSV, TSV files and its great integration with Python, which is our main scripting language for data transformation and automation alongside SQL.

And so we began. With us needing to backup our database data, S3 was the obvious choice. The process was simple and can actually be narrowed down to just three steps:-

  • Every week, all the necessary data transformations and updates were done on the database tables
  • A backup script was created on python to download the tabular data and store them as CSV files locally (temporary). This script followed a certain structure:
* It created a folder which was same as the table name from where the data was backed up

* Inside that folder it had a folder called 'archive' which stored all the historically backed up CSV files as we maintained versions
* There was a separate CSV file which was the latest copy of the backup

Based on this structure now, we had to upload that data to S3. Initially, it was easy when the project started as we manually uploaded the folders for the tables but we needed some automated way of doing it without any human touch.

Finally, I found a solution.

#python #amazon-web-services #automation #s3 #aws

Simple High Speed Uploader for AWS Simple Storage Service (S3!)
1.70 GEEK