Here at Gitstart, while working with clients on platforms like Github or Gitlab, we often find the need to sync codebases across a pair of remote repositories. For each client assigned task, our developers work on a branch of a private repository (cloned/forked from a client repository) and we need to make sure that the client repository is up to date with this and vice-versa. Over time, the syncing of repositories becomes a repetitive process so it makes sense to introduce some sort of automation to this work.

Enter Gitstart Fork: Fork is an internal tool we use which leverages the power of web hooks in order to do almost real-time syncing of code across a pair of repositories, seamlessly.

But how does it work, you might be asking. To simplify the codebase, we’ve decided to split functionality into two parts:

  1. Pull: move changes in the client repository into our repository
  2. Push: move changes in our repository to the client’s repository

This article will mainly focus on Pull, and Push will be discussed in a future article.

Tech Stack: We write most of our code in TypeScript and Nodejs. Our database of choice is PostgreSQL with Hasura as the GraphQL engine (which has a nifty feature called a subscription which makes processing super easy).

For simplicity, we’ll be talking about Github repos, but this can be extended to any git based remote service like Gitlab or Bitbucket.


1. Features

Fork Pull offers the following features for syncing:

  • Branch control: specify which branches to sync.
  • Granular file syncing control: specify which folders/files to be synced and which to be ignored.
  • .gitignore support: ignore files mentioned in .gitignore while syncing.

2. The database

We start with a table which keeps track of the pair of repositories we want to sync, which branches to sync in those repositories as well as any files (or folders) which we don’t want to sync. A high-level schema would look like:

tableName: git_repo_slices 
- id: integer 
- fromRepo: string 
- toRepo: string 
- fromBranch: string 
- toBranch: string 
- ignored: string[] 
- folders: string[]

We also keep a record of all the so-called “pulls” we have made:

tableName: git_slice_pulls 
- id: integer 
- startedAt: timestamp 
- finishedAt: timestamp 
- error: string 
- commitSlice: relation to git_commit_slices 
- repoSlice: relation to git_repo_slices

This table acts as a middle ground between starting a pull and ending a pull. We mainly use it to track the progress of a pull event. We can gauge the success or failure of a pull, learn about errors that occurred, calculate the time required for a pull and much more.

#github #devops #typescript #graphql #git

Syncing Git Repos in Real-Time
1.35 GEEK