For the past couple years, we’ve been working to make Google Cloud an excellent platform for running C++ workloads. To demonstrate some of the progress we’ve made so far, we’ll show how you can use C++ with both Cloud Pub/Sub and Cloud Storage to build a highly scalable job queue running on Google Kubernetes Engine (GKE).

Such applications often need to distribute work to many compute nodes to achieve good performance. Part of the appeal of public cloud providers is the ability to schedule these kinds of parallel computations on demand, growing the size of the cluster that runs the computation as needed, and shrinking it when it’s no longer running. In this post we will explore how to realize this potential for C++ applications, using Pub/Sub and GKE.

A common pattern for running large-scale computations is a job queue, where work is represented by messages in the queue, and a number of worker applications pull items from the queue for processing. The recently released Pub/Sub (CPS) C++ client library makes it easy to implement this pattern. And with GKE autoscaling, the cluster running such a workload can grow and shrink on demand, saving C++ developers from the tedium of managing the cluster, and leaving them with more time to improve their applications.

Sample application

For our example, we will create millions of Cloud Storage objects; this models a parallel application that performs some computation (e.g., analyze a fraction of some large data set) and saves the results in separate Cloud Storage objects. We believe this workload is easier to understand than some exotic simulation, but it’s not purely artificial: from time-to-time our team needs to create large synthetic data sets for load testing.

Overview

The basic idea is to break the work into a small number of work items, such as, “create 1,000 objects with this prefix”. We use a command-line tool to publish these work items to a Pub/Sub topic, which reliably delivers them to any number of worker nodes that execute the work items. We use GKE to run the worker nodes, as GKE automatically scales the cluster based on demand, and restarts the worker nodes if needed after a failure.

Because Pub/Sub offers at-least-once delivery, and because the worker nodes may be restarted by GKE, it’s important to make these work items idempotent, that is, executing the work item multiple times produces the same objects in Cloud Storage as executing the work item a single time.

The code for this example is available in this GitHub repository.

#c++ #pub/sub #gke

Running C++ apps with the help of Pub/Sub and GKE
1.15 GEEK