What we learned after a year of GitLab.com on Kubernetes

For about a year now, the infrastructure department has been working on migrating all services that run on GitLab.com to Kubernetes. The effort has not been without challenges, not only with moving services to Kubernetes but also managing a hybrid deployment during the transition. We have learned a number of lessons along the way that we will explore in this post.

Since the very beginning of GitLab.com, servers for the website have run in the cloud on virtual machines. These VMs are managed by Chef and installed using our official Linux package. When an application update is required, our deployment strategy is to simply upgrade fleets of servers in a coordinated rolling fashion using a CI pipeline. This method, while slow and a bit boring, ensures that GitLab.com is using the same installation methods and configuration as our self-managed customers who use Linux packages. We use this method because it is especially important that any pain or joy felt by the community when installing or configuring self-managed GitLab is also felt by GitLab.com. This approach worked well for us for a time but as GitLab.com has grown to hosting over 10 million projects we realized it would no longer serve our needs for scaling and deployments.

Enter Kubernetes and cloud native GitLab

We created the GitLab Charts project in 2017 to prepare GitLab for deployments in the cloud and enable self-managed users to install GitLab into a Kubernetes cluster. We knew then that running GitLab.com on Kubernetes would benefit the SaaS platform for scaling, deployments, and efficient use of compute resources. At the time though there were still many application features that depended on NFS mounts that delayed our migration off of VMs.

The push for cloud native and Kubernetes gave engineering an opportunity to plan a gradual transition that removes some of the network storage dependencies on the application while continuing to develop new features. Since we started planning the migration in the summer of 2019, most of these limitations have been resolved and the journey to running all of GitLab.com on Kubernetes is now well underway!

Running GitLab.com on Kubernetes

For GitLab.com we use a single regional GKE cluster that services all application traffic. To minimize the complexity of the (already complex) migration we focus on services that don’t depend on local storage or NFS. While GitLab.com is running from mostly monolithic Rails codebase, we route traffic depending on workload characteristics to different endpoints which are isolated into their own node pools.

On the frontend these types are divided into web, API, git SSH/HTTPs requests, and Registry. On the backend we divide our queued jobs into different characteristics depending on predefined resource boundaries that allow us to set Service-level Objective (SLO) targets for a range of different workloads.

All of these GitLab.com services are configured with the unmodified GitLab Helm chart, which configures them in sub-charts that can be selectively enabled as we gradually migrate services to the cluster. While we opted to not include some of our stateful services such as Redis, Postgres, GitLab Pages, and Gitaly, when the migration to Kubernetes is finished it will drastically reduce the number of VMs that we currently manage with Chef.

#kubernetes

Enter Kubernetes and cloud native GitLab

Running GitLab.com on Kubernetes

about.gitlab.com

What we learned after a year of GitLab.com on Kubernetes