In Part 1, we looked at extracting a csv file from an FTP server and loading it into Google BigQuery using Cloud Functions. In this article, we will be doing the same thing but this time, we will be extracting data from a MySQL database instead.
There are a lot of ETL tools out there and sometimes they can be overwhelming, especially when you simply want to copy a file from point A to B. So today, I am going to show you how to extract data from a MySQL database (Extract), modify it (Transform) and load it into a Google BigQuery table (Load) using python 3.6 and Google Cloud Functions.
In this article, we will look to do the following:
Extract, Transform, Load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source or in a different context than the source. — Wikipedia
This is what our ETL pipeline will look like in the end:
ETL Pipeline (Created with Lucidchart)
Google Cloud Functions: Cloud Functions (CF) is Google Cloud’s Serverless platform set to execute scripts responding to specified events, such as a HTTP request or a database update. An alternative to CF is AWS Lambda or Azure Functions.
Setting up your Cloud Function
Make sure your runtime is set to “Python 3.7” and under “Advanced options” change the region to one closest to you. As at the writing of this post, CF isn’t available in every Google data-centre region, so check here to see where Cloud Functions is enabled.
#data-science #sql #python #data-engineering #google-cloud-platform #function