Git Clone: A Data-driven Study on Cloning Behaviors

Git Clone: A Data-driven Study on Cloning Behaviors

Git clone: a data-driven study on cloning behaviors. @derrickstolee recently discussed several different git clone options, but how do those options actually affect your Git performance? How fast are the various git clone commands? Here at GitHub, we use a data-driven approach to answer these questions.

@derrickstolee recently  discussed several different git clone  options, but how do those options actually affect your Git performance? Which option is fastest for your client experience? Which option is fastest for your build machines? How can these options impact server performance? If you are a GitHub Enterprise Server administrator it’s important that you understand how the server responds to these options under the load of multiple simultaneous requests.

Here at GitHub, we use a data-driven approach to answer these questions. We ran an experiment to compare these different clone options and measured the client and server behavior. It is not enough to just compare git clone times, because that is only the start of your interaction with a Git repository. In particular, we wanted to determine how these clone options change the behavior of future Git operations such as git fetch.

In this experiment, we aimed to answer the below questions:

  1. How fast are the various git clone commands?
  2. Once we have cloned a repository, what kind of impact do future git fetch commands have on the server and client?
  3. What impact do full, shallow and partial clones have on a Git server? This is mostly important for our GitHub Enterprise Server Admins.
  4. Will the repository shape and size make any difference in the overall performance?

It is worth special emphasis that these results come from simulations that we performed in our controlled environments and do not simulate complex workflows that might be used by many Git users. Depending on your workflows and repository characteristics these results may change. Perhaps this experiment provides a framework that you could follow to measure how your workflows are affected by these options. If you would like help analyzing your worksflows, feel free to engage with  GitHub’s Professional Services team.

For a summary of our findings, feel free to jump to  our conclusions and recommendations.

Experiment design

To maximize the repeatability of our experiment, we use open source repositories for our sample data. This way, you can compare your repository shape to the tested repositories to see which is most applicable to your scenario.

We chose to use the jquery/jqueryapple/swift and torvalds/linux repositories. These three repositories vary in size and number of commits, blobs, and trees.

These repositories were mirrored to a GitHub Enterprise Server running version 2.22 on a 8-core cloud machine. We use an internal load testing tool based on  Gatling to generate git requests against the test instance. We ran each test with a specific number of users across 5 different load generators for 30 minutes. All of our load generators use git version 2.28.0 which by default is using protocol version 1. We would like to make a note that protocol version 2 only improves ref advertisement and therefore we don’t expect it to make a difference in our tests.

Once a test is complete, we use a combination of Gatling results, ghe-governor and server health metrics to analyze the test.

uncategorized git

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Best Practices for Using Git

Git has become ubiquitous as the preferred version control system (VCS) used by developers. Using Git adds immense value especially for engineering teams where several developers work together since it becomes critical to have a system of integrating everyone's code reliably.

7 Best Practices in GIT for Your Code Quality

Git plays a significant role in software development. It allows developers to work on the same code base at the same time. Check out 7 best practices for Git.There is no doubt that Git plays a significant role in software development. It allows developers to work on the same code base at the same time. Still, developers struggle for code quality.

Git Commands You Can Use To Dig Through Your Git History

Git Commands You Can Use To Dig Through Your Git History. In this short article, we’ll be exploring some quick git commands that can help us in digging through our repositories’ history of commits.

Git Rebase Tutorial and Comparison with Git Merge

In this article, I will explain to you a few differences between git merge, git rebase, and the git interactive rebase.I will tell a bit about what pros...

Mirroring Git Changes From One Server to Another Server

Hello all, nowadays most of the development teams using GIT version control, some of you may have a requirement of mirroring your team's git changes from one server to another Git server. This article will help you to achieve the Git mirroring between one server to another server.