Understanding the Fundamentals of Git

Understanding various Git Objects which operate under the hood

Git is a powerful tool, but it can be overwhelming especially for newcomers. Even for experienced developers, getting stuck in a merge or a rebase conflict is pretty common. Even with extensive blogs available, it can be sometimes tricky to identify the cause, ultimately ending up wasting our productive time.

There are a plethora of tutorials out there already, but most of them simply talk about high-level user commands, syntax, and how to use them abstracting out most of the internal details.

This article tries to uncover how Git works under the hood. But why should we even bother about that? Knowing how it works from the ground up can help a great deal in using the commands. Since it’s not about learning the commands, instead it’s about learning the conceptual model. So, instead of learning a bunch of commands, we will try to understand how the commands work. Once we start to understand it, we’ll be more and more comfortable and confident in using the commands, and eventually, we’ll start enjoying it. Besides, there is always a geeky pleasure in knowing the internals. With this, let’s demystify git.

Git — A Stupid Content Tracker (as documented in man git)

At its very core, git is a simple key-value store, a simple structure that maps keys to values and this structure is persistent i.e. it’s stored in our disk. The values are just sequences of bytes. It can be any kind of content, be it a text file or binary file, etc. We can provide a value to git and git will calculate a unique key for it, which can be used later to retrieve the content.

As a demonstration, we’ll use the hash-object command, which takes some data and gives back the unique key. Since hash-object does not operate directly on string data, we’ll have to pipe the input and pass the — stdin flag to let hash-object know. (hash-object is one of the plumbing commands, it’s totally fine if it’s not used by you before)

$ echo "Hello Sarah" | git hash-object --stdin
0e170dcd3efae4d183c78dced6c5b9c7e13f70c5

The above-generated hash will be used as a key to store that content. It is generated using the SHA1 algorithm. To dig deep into SHA1, you can refer here. The important point to note is, for the same sequence of bytes, we’ll always get the same hash every time on every operating system. And it’s almost impossible that we’ll get the same hash for two different content.

#git #programming

Understanding various Git Objects which operate under the hood

Git — A Stupid Content Tracker (as documented in man git)

towardsdatascience.com

Understanding the Fundamentals of Git