Batching Jobs in GCP using the Cloud Scheduler and Functions

A walkthrough of how batching Jobs in GCP using the Cloud Scheduler and Functions. A batch job running every 5 minutes will use up 1 cloud scheduler job. Serverless batch jobs can be set up in the GCP platform using the Cloud Scheduler, Pub/Sub, and Cloud Functions.

While designing and implementing solutions, I am often faced with the need to set up recurring batch jobs around data storage and processing. Recently I have been trying to keep my infrastructure as serverless as possible so in this article, I will show you how Google Cloud Platform can be leveraged to run almost any batch job your project might need for free.

Use Cases

For me, this batch pattern is the most useful when it comes to data processing, reconciliation, and cleanup. Here is an example involving data aggregation…

A bucket can be an effective repository for streaming data but if your payloads are small in size and frequent — having a file for every payload can get expensive if you have to do frequent reads. I solve this problem by running a batch job to merge individual payloads into hourly or daily files, allowing for much more cost effective solution.

Or how about database cleanup…

If you have a SQL database containing large timeseries data sets, regular purging is critical for performance. You can squeeze a recurring job into a web application or the ETL system that is loading data into your tables however I solve for this using this serverless batch approach to decouple the solution and simplify maintenance.

