Learn git concepts, not commands

Learn git concepts, not commands

An interactive git tutorial means teaching you how git works, not just a command to execute ... The following article by Nico Riedmann will help you understand that problem ...

An interactive git tutorial means teaching you how git works, not just a command to execute ... The following article by Nico Riedmann will help you understand that problem ...

So, you want to use git right?

But you don't just want to learn commands, you want to understand what you're using?

Then this is meant for you!

Let's get started!

Based on the general concept from Rachel M. Carmena's blog post on How to teach Git.> While I find many git tutorials on the internet to be too focused on what to do instead of how things work, the most invaluable resource for both (and source for this tutorial!) are the git Book and Reference page.> So if you're still interested when you're done here, go check those out! I do hope the somewhat different concept of this tutorial will aid you in understanding all the other git features detailed there.* Overview

  • Getting a Remote Repository
    Adding new things* Making changes
  • Branching
  • Merging
    Fast-Forward mergingMerging divergent branchesResolving conflicts* Rebasing
    Resolving conflicts* Updating the Dev Environment with remote changes
  • Cherry-picking
  • Rewriting history
  • Reading history

In the picture below you see four boxes. One of them stands alone, while the other three are grouped together in what I'll call your Development Environment.

We'll start with the one that's on it's own though. The Remote Repository is where you send your changes when you want to share them with other people, and where you get their changes from. If you've used other version control systems there's nothing interesting about that.

The Development Environment is what you have on your local machine.

The three parts of it are your Working Directory, the Staging Area and the Local Repository. We'll learn more about those as we start using git.

Choose a place in which you want to put your Development Environment.

Just go to your home folder, or where ever you like to put your projects. You don't need to create a new folder for your Dev Environment though.

Getting a Remote Repository

Now we want to grab a Remote Repository and put what's in it onto your machine.

I'd suggest we use this one (https://github.com/UnseenWizzard/git_training.git if you're not already reading this on github).

To do that I can use git clone <a href="https://github.com/UnseenWizzard/git_training.git" target="_blank">https://github.com/UnseenWizzard/git_training.git</a>> But as following this tutorial will need you to get the changes you make in your Dev Environment back to the Remote Repository, and github doesn't just allow anyone to do that to anyone's repo, you'll best create a fork of it right now. There's a button to do that on the top right of this page.
Now that you have a copy of my Remote Repository of your own, it's time to get that onto your machine.

For that we use git clone <a href="https://github.com/{YOUR" target="_blank">https://github.com/{YOUR</a> USERNAME}/git_training.git

As you can see in the diagram below, this copies the Remote Repository into two places, your Working Directory and the Local Repository.

Now you see how git is distributed version control. The Local Repository is a copy of the Remote one, and acts just like it. The only difference is that you don't share it with anyone.

What git clone also does, is create a new folder wherever you called it. There should be a git_training folder now. Open it.

Adding new things

Someone already put a file into the Remote Repository. It's Alice.txt, and kind of lonely there. Let's create a new file and call it Bob.txt.

What you've just done is add the file to your Working Directory.

There's two kinds of files in your Working Directory: tracked files that git knows about and untracked files that git doesn't know about (yet).

To see what's going on in your Working Directory run git status, which will tell you what branch you're on, whether your Local Repository is different from the Remote and the state of tracked and untracked files.

You'll see that Bob.txt is untracked, and git status even tells you how to change that.

In the picture below you can see what happens when you follow the advice and execute git add Bob.txt: You've added the file to the Staging Area, in which you collect all the changes you wish to put into Repository

When you have added all your changes (which right now is only adding Bob), you're ready to commit what you just did to the Local Repository.

The collected changes that you commit are some meaningful chunk of work, so when you now run git commit a text editor will open and allow you to write a message telling everything what you just did. When you save and close the message file, your commit is added to the Local Repository.

You can also add your commit message right there in the command line if you call git commit like this: git commit -m "Add Bob". But because you want to write good commit messages you really should take your time and use the editor.

Now your changes are in your local repository, which is a good place for the to be as long as no one else needs them or you're not yet ready to share them.

In order to share your commits with the Remote Repository you need to pushthem.

Once you run git push the changes will be sent to the Remote Repository. In the diagram below you see the state after your push.

Making changes

So far we've only added a new file. Obviously the more interesting part of version control is changing files.

Have a look at Alice.txt.

It actually contains some text, but Bob.txt doesn't, so lets change that and put Hi!! I'm Bob. I'm new here. in there.

If you run git status now, you'll see that Bob.txt is modified.

In that state the changes are only in your Working Directory.

If you want to see what has changed in your Working Directory you can run git diff, and right now see this:

diff --git a/Bob.txt b/Bob.txt
index e69de29..3ed0e1b 100644
--- a/Bob.txt
+++ b/Bob.txt
@@ -0,0 +1 @@
+Hi!! I'm Bob. I'm new here.

Go ahead and git add Bob.txt like you've done before. As we know, this moves your changes to the Staging Area.

I want to see the changes we just staged, so let's show the git diff again! You'll notice that this time the output is empty. This happens because git diff operates on the changes in your Working Directory only.

To show what changes are staged already, we can use git diff --stagedand we'll see the same diff output as before.

I just noticed that we put two exclamation marks after the 'Hi'. I don't like that, so lets change Bob.txt again, so that it's just 'Hi!'

If we now run git status we'll see that there's two changes, the one we already staged where we added text, and the one we just made, which is still only in the working directory.

We can have a look at the git diff between the Working Directory and what we've already moved to the Staging Area, to show what has changed since we last felt ready to stage our changes for a commit.

diff --git a/Bob.txt b/Bob.txt
index 8eb57c4..3ed0e1b 100644
--- a/Bob.txt
+++ b/Bob.txt
@@ -1 +1 @@
-Hi!! I'm Bob. I'm new here.
+Hi! I'm Bob. I'm new here.

As the change is what we wanted, let's git add Bob.txt to stage the current state of the file.

Now we're ready to commit what we just did. I went with git commit -m "Add text to Bob" because I felt for such a small change writing one line would be enough.

As we know, the changes are now in the Local Repository.

We might still want to know what change we just committed and what was there before.

We can do that by comparing commits.

Every commit in git has a unique hash by which it is referenced.

If we have a look at the git log we'll not only see a list of all the commits with their hash as well as Author and Date, we also see the state of our Local Repository and the latest local information about remote branches.

Right now the git log looks something like this:

commit 87a4ad48d55e5280aa608cd79e8bce5e13f318dc (HEAD -> master)
Author: {YOU} <{YOUR EMAIL}>
Date:   Sun Jan 27 14:02:48 2019 +0100

    Add text to Bob

commit 8af2ff2a8f7c51e2e52402ecb7332aec39ed540e (origin/master, origin/HEAD)
Author: {YOU} <{YOUR EMAIL}>
Date:   Sun Jan 27 13:35:41 2019 +0100

    Add Bob

commit 71a6a9b299b21e68f9b0c61247379432a0b6007c 
Author: UnseenWizzard <[email protected]>
Date:   Fri Jan 25 20:06:57 2019 +0100

    Add Alice

commit ddb869a0c154f6798f0caae567074aecdfa58c46
Author: Nico Riedmann <[email protected]>
Date:   Fri Jan 25 19:25:23 2019 +0100

    Add Tutorial Text

      Changes to the tutorial are all squashed into this commit on master, to keep the log free of clutter that distracts from the tutorial

      See the tutorial_wip branch for the actual commit history

In there we see a few interesting things:

  • The first two commits are made by me.
  • Your initial commit to add Bob is the current HEAD of the masterbranch on the Remote Repository. We'll look at this again when we talk about branches and getting remote changes.
  • The latest commit in the Local Repository is the one we just made, and now we know its hash.

Note that the actual commit hashes will be different for you. If you want to know how exactly git arrives at those revision IDs have a look at this interesting article .
To compare that commit and the one one before we can do git diff <commit>^!, where the ^! tells git to compare to the commit one before. So in this case I run git diff 87a4ad48d55e5280aa608cd79e8bce5e13f318dc^!

We can also do git diff 8af2ff2a8f7c51e2e52402ecb7332aec39ed540e 87a4ad48d55e5280aa608cd79e8bce5e13f318dc for the same result and in general to compare any two commits. Note that the format here is git diff <from commit> <to commit>, so our new commit comes second.

In the diagram below you again see the different stages of a change, and the diff commands that apply to where a file currently is.

Now that we're sure we made the change we wanted, go ahead and git push.


Another thing that makes git great, is the fact that working with branches is really easy and integral part of how you work with git.

In fact we've been working on a branch since we've started.

When you clone the Remote Repository your Dev Environment automatically starts on the repositories main or master branch.

Most work-flows with git include making your changes on a branch, before you merge them back into master.

Usually you'll be working on your own branch, until you're done and confident in your changes which can then be merged into the master.

Many git repository managers like GitLab and GitHub also allow for branches to be protected, which means that not everyone is allowed to just push changes there. There the master is usually protected by default.
Don't worry, we'll get back to all of these things in more detail when we need them.

Right now we want to create a branch to make some changes there. Maybe you just want to try something on your own and not mess with the working state on your master branch, or you're not allowed to push to master.

Branches live in the Local and Remote Repository. When you create a new branch, the branches contents will be a copy of the currently committed state of whatever branch you are currently working on.

Let's make some change to Alice.txt! How about we put some text on the second line?

We want to share that change, but not put it on master right away, so let's create a branch for it using git branch <branch name>.

To create a new branch called change_alice you can run git branch change_alice.

This adds the new branch to the Local Repository.

While your Working Directory and Staging Area don't really care about branches, you always commit to the branch you are currently on.

You can think of branches in git as pointers, pointing to a series of commits. When you commit, you add to whatever you're currently pointing to.

Just adding a branch, doesn't directly take you there, it just creates such a pointer.

In fact the state your Local Repository is currently at, can be viewed as another pointer, called HEAD, which points to what branch and commit you are currently at.

If that sounds complicated the diagrams below will hopefully help to clear things up a bit:

To switch to our new branch you will have to use git checkout change_alice. What this does is simply to move the HEAD to the branch you specify.

As you'll usually want switch to a branch right after creating it, there is the convenient -b option available for the checkout command, which allows you to just directly checkout a new branch, so you don't have to create it beforehand.> So to create and switch to our change_alice branch, we could also just have called git checkout -b change_alice.

You'll notice that your Working Directory hasn't changed. That we've modifiedAlice.txt is not related to the branch we're on yet.

Now you can add and commit the change to Alice.txt just like we did on the master before, which will stage (at which point it's still unrelated to the branch) and finally commit your change to the change_alice branch.

There's just one thing you can't do yet. Try to git push your changes to the Remote Repository.

You'll see the following error and - as git is always ready to help - a suggestion how to resolve the issue:

fatal: The current branch change_alice has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin change_alice 

But we don't just want to blindly do that. We're here to understand what's actually going on. So what are upstream branches and remotes?

Remember when we cloned the Remote Repository a while ago? At that point it didn't only contain this tutorial and Alice.txt but actually two branches.

The master we just went ahead and started working on, and one I called "tutorial_wip" on which I commit all the changes I make to this tutorial.

When we copied the things in the Remote Repository into your Dev Environment a few extra steps happened under the hood.

Git setup the remote of your Local Repository to be the Remote Repository you cloned and gave it the default name origin.

Your Local Repository can track several remotes and they can have different names, but we'll stick to the origin and nothing else for this tutorial.
Then it copied the two remote branches into your Local Repository and finally it checked out master for you.

When doing that another implicit step happens. When you checkout a branch name that has an exact match in the remote branches, you will get a new local branch that is linked to the remote branch. The remote branch is the upstream branch of your local one.

In the diagrams above you can see just the local branches you have. You can see that list of local branches by running git branch.

If you want to also see the remote branches your Local Repository knows, you can use git branch -a to list all of them.

Now we can call the suggested git push --set-upstream origin change_alice, and push the changes on our branch to a new remote. This will create a change_alice branch on the Remote Repository and set our localchange_alice to track that new branch.

There is another option if we actually want our branch to track something that already exists on the Remote Repository. Maybe a colleague has already pushed some changes, while we were working on something related on our local branch, and we'd like to integrate the two. Then we could just set the upstream for our change_alice branch to a new remote by using git branch --set-upstream-to=origin/change_alice and from there on track the remote branch.
After that went through have a look at your Remote Repository on github, your branch will be there, ready for other people to see and work with.

We'll get to how you can get other people's changes into your Dev Environment soon, but first we'll work a bit more with branches, to introduce all the concepts that also come into play when we get new things from the Remote Repository.


As you and everyone else will generally be working on branches, we need to talk about how to get changes from one branch into the other by mergingthem.

We've just changed Alice.txt on the change_alice branch, and I'd say we're happy with the changes we made.

If you go and git checkout master, the commit we made on the other branch will not be there. To get the changes into master we need to mergethe change_alice branch into master.

Note that you always merge some branch into the one you're currently at.

Fast-Forward merging

As we've already checked out master, we can now git merge change_alice.

As there are no other conflicting changes to Alice.txt, and we've changed nothing on master, this will go through without a hitch in what is called a fast forward merge.

In the diagrams below, you can see that this just means that the masterpointer can simply be advanced to where the change_alice one already is.

The first diagram shows the state before our merge, master is still at the commit it was, and on the other branch we've made one more commit.

The second diagram shows what has changed with our merge.

Merging divergent branches

Let's try something more complex.

Add some text on a new line to Bob.txt on master and commit it.

Then git checkout change_alice, change Alice.txt and commit.

In the diagram below you see how our commit history now looks. Both master and change_alice originated from the same commit, but since then they diverged, each having their own additional commit.

If you now git merge change_alice a fast-forward merge is not possible. Instead your favorite text editor will open and allow you to change the message of the merge commit git is about to make in order to get the two branches back together. You can just go with the default message right now. The diagram below shows the state of our git history after we the merge.

The new commit introduces the changes that we've made on the change_alice branch into master.

As you'll remember from before, revisions in git, aren't only a snapshot of your files but also contain information on where they came from from. Each commit has one or more parent commits. Our new merge commit, has both the last commit from master and the commit we made on the other branch as it's parents.

Resolving conflicts

So far our changes haven't interfered with each other.

Let's introduce a conflict and then resolve it.

Create and checkout a new branch. You know how, but maybe try using git checkout -b to make your live easier.

I've called mine bobby_branch.

On the branch we'll make a change to Bob.txt.

The first line should still be Hi!! I'm Bob. I'm new here.. Change that to Hi!! I'm Bobby. I'm new here.

Stage and then commit your change, before you checkout master again. Here we'll change that same line to Hi!! I'm Bob. I've been here for a while now. and commit your change.

Now it's time to merge the new branch into master.

When you try that, you'll see the following output

Auto-merging Bob.txt

CONFLICT (content): Merge conflict in Bob.txt

Automatic merge failed; fix conflicts and then commit the result.

The same line has changed on both of the branches, and git can't handle this on it's own.

If you run git status you'll get all the usual helpful instructions on how to continue.

First we have to resolve the conflict by hand.

For an easy conflict like this one your favorite text editor will do fine. For merging large files with lots of changes a more powerful tool will make your life much easier, and I'd assume your favorite IDE comes with version control tools and a nice view for merging.
If you open Bob.txt you'll see something similar to this (I've truncated whatever we might have put on the second line before):

<<<<<<< HEAD

Hi! I'm Bob. I've been here for a while now.


Hi! I'm Bobby. I'm new here.

>>>>>>> bobby_branch

[... whatever you've put on line 2]

On top you see what has changed in Bob.txt on the current HEAD, below you see what has changed in the branch we're merging in.

To resolve the conflict by hand, you'll just need to make sure that you end up with some reasonable content and without the special lines git has introduced to the file.

So go ahead and change the file to something like this:

Hi! I'm Bobby. I've been here for a while now.


From here what we're doing is exactly what we'd do for any changes.

We stage them when we add Bob.txt, and then we commit.

We already know the commit for the changes we've made to resolve the conflict. It's the merge commit that is always present when merging.

Should you ever realize in the middle of resolving conflicts that you actually don't want to follow through with the merge, you can just abort it by running git commit --abort.


Git has another clean way to integrate changes between two branches, which is called rebase.

We still recall that a branch is always based on another. When you create it, you branch away from somewhere.

In our simple merging example we branched from master at a specific commit, then committed some changes on both master and the change_alicebranch.

When a branch is diverging from the one it's based on and you want to integrate the latest changes back into your current branch, rebase offers a cleaner way of doing that than a merge would.

As we've seen, a merge introduces a merge commit in which the two histories get integrated again.

Viewed simply, rebasing just changes the point in history (the commit) your branch is based on.

To try that out, let's first checkout the master branch again, then create/checkout a new branch based on it.

I called mine add_patrick and I added a new Patrick.txt file and committed that with the message 'Add Patrick'.

When you've added a commit to the branch, get back to master, make a change and commit it. I added some more text to Alice.txt.

Like in our merging example the history of these two branches diverges at a common ancestor as you can see in the diagram below.

Now let's checkout add_patrick again, and get that change that was made on master into the branch we work on!

When we git rebase master, we re-base our add_patrick branch on the current state of the master branch.

The output of that command gives us a nice hint at what is happening in it:

First, rewinding head to replay your work on top of it...

Applying: Add Patrick

As we remember HEAD is the pointer to the current commit we're at in our Dev Environment.

It's pointing to the same place as add_patrick before the rebase starts. For the rebase, it then first moves back to the common ancestor, before moving to the current head of the branch we want to re-base ours on.

So HEAD moves from the 0cfc1d2 commit, to the 7639f4b commit that is at the head of master.

Then rebase applies every single commit we made on our add_patrickbranch to that.

To be more exact what git does after moving HEAD back to the common ancestor of the branches, is to store parts of every single commit you've made on the branch (the diff of changes, and the commit text, author, etc.).

After that it does a checkout of the latest commit of the branch you're rebasing on, and then applies each of the stored changed as a new commiton top of that.

So in our original simplified view, we'd assume that after the rebase the 0cfc1d2 commit doesn't point to the common ancestor anymore in it's history, but points to the head of master.

In fact the 0cfc1d2 commit is gone, and the add_patrick branch starts with a new 0ccaba8 commit, that has the latest commit of master as its ancestor.

We made it look, like our add_patrick was based on the current master not an older version of it, but in doing so we re-wrote the history of the branch.

At the end of this tutorial we'll learn a bit more about re-writing history and when it's appropriate and inappropriate to do so.

Rebase is an incredibly powerful tool when you're working on your own development branch which is based on a shared branch, e.g. the master.

Using rebase you can make sure that you frequently integrate the changes other people make and push to master, while keeping a clean linear history that allows you to do a fast-forward merge when it's time to get your work into the shared branch.

Keeping a linear history also makes reading or looking at (try out git log --graph or take a look at the branch view of GitHub or GitLab) commit logs much more useful than having a history littered with merge commits, usually just using the default text.

Resolving conflicts

Just like for a merge you may run into conflicts, if you run into two commits changing the same parts of a file.

However when you encounter a conflict during a rebase you don't fix it in an extra merge commit, but can simply resolve it in the commit that is currently being applied.

Again, basing your changes directly on the current state of the original branch.

Actually resolving conflicts while you rebase is very similar to how you would for a merge so refer back to that section if you're not sure anymore how to do it.

The only distinction is, that as you're not introducing a merge commit there is no need to commit your resolution. Simply add the changes to the Staging Environment and then git rebase --continue. The conflict will be resolved in the commit that was just being applied.

As when merging, you can always stop and drop everything you've done so far when you git rebase --abort.

Updating the Dev Environment with remote changes

So far we've only learned how to make and share changes.

That fits what you'll do if you're just working on your own, but usually there'll be a lot of people that do just the same, and we're gonna want to get their changes from the Remote Repository into our Dev Environment somehow.

Because it has been a while, lets have another look at the components of git:

Just like your Dev Environment everyone else working on the same source code has theirs.

All of these Dev Environments have their own working and staged changes, that are at some point committed to the Local Repository and finally pushed to the Remote.

For our example, we'll use the online tools offered by GitHub, to simulate someone else making changes to the remote while we work.

Go to your fork of this repo on github.com and open the Alice.txt file.

Find the edit button and make and commit a change via the website.

In this repository I have added a remote change to Alice.txt on a branch called fetching_changes_sample, but in your version of the repository you can of course just change the file on master.

Fetching Changes

We still remember that when you git push, you synchronize changes made to the Local Repository into the Remote Repository.

To get changes made to the Remote into your Local Repository you use git fetch.

This gets any changes on the remote - so commits as well as branches - into your Local Repository.

Note that at this point, changes aren't integrated into the local branches and thus the Working Directory and Staging Area yet.

If you run git status now, you'll see another great example of git commands telling you exactly what is going on:

git status

On branch fetching_changes_sample

Your branch is behind 'origin/fetching_changes_sample' by 1 commit, and can be fast-forwarded.

(use "git pull" to update your local branch)

Pulling Changes

As we have no working or staged changes, we could just execute git pullnow to get the changes from the Repository all the way into our working area.

Pulling will implicitly also fetch the Remote Repository, but sometimes it is a good idea to do a fetch on it's own. > For example when you want to synchronize any new remote branches, or when you want to make sure your Local Repository is up to date before you do a git rebase on something like origin/master.

Before we pull, lets change a file locally to see what happens.

Lets also change Alice.txt in our Working Directory now!

If you now try to do a git pull you'll see the following error:

git pull

Updating df3ad1d..418e6f0

error: Your local changes to the following files would be overwritten by merge:


Please commit your changes or stash them before you merge.


You can not pull in any changes, while there are modifications to files in the Working Directory that are also changed by the commits you're pulling in.

While one way around this is, to just get your changes to a point where you're confident in them, add them to the Staging Environment, before you finally commit them, this is a good moment to learn about another great tool, the git stash.

Stashing changes

If at any point you have local changes that you do not yet want to put into a commit, or want to store somewhere while you try some different angle to solve a problem, you can stash those changes away.

A git stash is basically a stack of changes on which you store any changes to the Working Directory.

The commands you'll mostly use are git stash which places any modifications to the Working Directory on the stash, and git stash popwhich takes the latest change that was stashed and applies it the to the Working Directory again.

Just like the stack commands it's named after git stash pop removes the latest stashed change before applying it again.

If you want to keep the stashed changes, you can use git stash apply, which doesn't remove them from the stash before applying them.

To inspect you current stash you can use git stash list to list the individual entries, and git stash show to show the changes in the latest entry on the stash.

Another nice convenience command is git stash branch {BRANCH NAME}, which creates a branch, starting from the HEAD at the moment you've stashed the changes, and applies the stashed changes to that branch.
Now that we know about git stash, lets run it to remove our local changes to Alice.txt from the Working Directory, so that we can go ahead and git pull the changes we've made via the website.

After that, let's git stash pop to get the changes back.

As both the commit we pulled in and the stashed change modified Alice.txt you wil have to resolve the conflict, just how you would in amerge or rebase.

When you're done add and commit the change.

Pulling with Conflicts

Now that we've understood how to fetch and pull Remote Changes into our Dev Environment, it's time to create some conflicts!

Do not push the commit that changed Alice.txt and head back to your Remote Repository on github.com.

There we're also again going to change Alice.txt and commit the change.

Now there's actually two conflicts between our Local and Remote Repositories.

Don't forget to run git fetch to see the remote change without pulling it in right away.

If you now run git status you will see, that both branches have one commit on them that differs from the other.

git status

On branch fetching_changes_sample

Your branch and 'origin/fetching_changes_sample' have diverged,

and have 1 and 1 different commits each, respectively.

(use "git pull" to merge the remote branch into yours)

In addition we've changed the same file in both of those commits, to introduce a merge conflict we'll have to resolve.

When you git pull while there is a difference between the Local and Remote Repository the exact same thing happens as when you merge two branches.

Additionally, you can think of the relationship between branches on the Remote and the one in the Local Repository as a special case of creating a branch based on another.

A local branch is based on a branches state on the Remote from the time you last fetched it.

Thinking that way, the two options you have to get remote changes make a lot of sense:

When you git pull the Local and Remote version of a branch will be merged. Just like merging branches, this will introduce a _merge commit.

As any local branch is based on it's respective remote version, we can also rebase it, so that any changes we may have made locally, appear as if they were based on the latest version that is available in the _Remote Repository.

To do that, we can use git pull --rebase (or the shorthand git pull -r).

As detailed in the section on Rebasing, there is a benefit in keeping a clean linear history, which is why I would strongly recommend that whenever you git pull you do a git pull -r.

You can also tell git to use rebase instead of merge as it's default strategy when your git pull, by setting the pull.rebase flag with a command like this git config --global pull.rebase true.
If you haven't already run git pull when I first mentioned it a few paragraphs ago, let's now run git pull -r to get the remote changes while making it look like our new commit just happened after them.

Of course like with a normal rebase (or merge) you'll have to resolve the conflict we introduced for the git pull to be done.


Congratulations! You've made it to the more advanced features!> By now you understand how to use all the typical git commands and more importantly how they work.> This will hopefully make the following concepts much simpler to understand than if I just told you what commands to type in.> So let's head right in an learn how to cherry-pick commits!
From earlier sections you still remember roughly what a commit is made off, right?

And how when you <a href="https://dev.to/unseenwizzard/learn-git-concepts-not-commands-4gjc#rebasing" target="_blank">rebase</a> a branch your commits are applied as new commits with the same change set and message?

Whenever you want to just take a few choice changes from one branch and apply them to another branch, you want to cherry-pick these commits and put them on your branch.

That is exactly what git cherry-pick allows you to do with either single commits or a range of commits.

Just like during a rebase this will actually put the changes from these commits into a new commit on your current branch.

Lets have a look at an example each for cherry-picking one or more commits:

The figure below shows three branches before we have done anything. Let's assume we really want to get some changes from the add_patrick branch into the change_alice branch. Sadly they haven't made it into master yet, so we can't just rebase onto master to get those changes (along with any other changes on the other branch, that we might not even want).

So let's just git cherry-pick the commit 63fc421.

The figure below visualizes what happens when we run git cherry-pick 63fc421

As you can see, a new commit with the changes we wanted shows up on branch.

At this point note that like with any other kind of getting changes onto a branch that we've seen before, any conflicts that arise during a cherry-pick will have to be resolved by us, before the command can go through.> Also like all other commands you can either --continue a cherry-pickwhen you've resolved conflicts, or decide to --abort the command entirely.
The figure below visualizes cherry-picking a range of commits instead of a single one. You can simply do that by calling the command in the form git cherry-pick <from>..<to> or in our example below as git cherry-pick 0cfc1d2..41fbfa7.

Rewriting history

I'm repeating myself now, but you still remember <a href="https://dev.to/unseenwizzard/learn-git-concepts-not-commands-4gjc#rebasing" target="_blank">rebase</a> well enough right? Else quickly jump back to that section, before continuing here, as we'll use what we already know when learning about how change history!
As you know a commit basically contains your changes, a message and few other things.

The 'history' of a branch is made up of all it's commits.

But lets say you've just made a commit and then notice, that you've forgotten to add a file, or you made a typo and the change leaves you with broken code.

We'll briefly look at two things we could do to fix that, and make it look like it never happened.

Let's switch to a new branch with git checkout -b rewrite_history.

Now make some changes to both Alice.txt and Bob.txt, and then git add Alice.txt.

Then git commit using a message like "This is history" and you're done.

Wait, did I say we're done? No, you'll clearly see that we've made some mistakes here:

Amending the last Commit

One way to fix both of these in one go would be to amend the commit we've just made.

Amending the latest commit basically works just like making a new one.

Before we do anything take a look at your latest commit, with git show {COMMIT}. Put either the commit hash (which you'll probably still see in your command line from the git commit call, or in the git log), or just HEAD.

Just like in the git log you'll see the message, author, date and of course changes.

Now let's amend what we've done in that commit.

git add Bob.txt to get the changes to the Staging Area, and then git commit --amend.

What happens next is your latest commit being unrolled, the new changes from the Staging Area added to the existing one, and the editor for the commit message opening.

In the editor you'll see the previous commit message.

Feel free to change it to something better.

After you're done, take another look at the latest commit with git show HEAD.

As you've certainly expected by now, the commit hash is different. The original commit is gone, and in it's place there is a new one, with the combined changes and new commit message.

Note how the other commit data like author and date are unchanged from the original commit. You can mess with those too, if you really want, by using the extra --author={AUTHOR} and --date={DATE} flags when amending.
Congratulations! You've just successfully re-written history for the first time!

Interactive Rebase

Generally when we git rebase, we rebase onto a branch. When we do something like git rebase origin/master, what actually happens, is a rebase onto the HEAD of that branch.

In fact if we felt like it, we could rebase onto any commit.

Remember that a commit contains information about the history that came before it
Like many other commands git rebase has an interactive mode.

Unlike most others, the interactive rebase is something you'll probably be using a lot, as it allows you to change history as much as you want.

Especially if you follow a work-flow of making many small commits of your changes, which allow you to easily jump back if you made a mistake, interactive rebase will be your closest ally.

Enough talk! Lets do something!

Switch back to your master branch and git checkout a new branch to work on.

As before, we'll make some changes to both Alice.txt and Bob.txt, and then git add Alice.txt.

Then we git commit using a message like "Add text to Alice".

Now instead of changing that commit, we'll git add Bob.txt and git commit that change as well. As message I used "Add Bob.txt".

And to make things more interesting, we'll make another change to Alice.txt which we'll git add and git commit. As a message I used "Add more text to Alice".

If we now have a look at the branch's history with git log (or for just a quick look preferably with git log --oneline), we'll see our three commits on top of whatever was on your master.

For me it looks like this:

git log --oneline

0b22064 (HEAD -> interactiveRebase) Add more text to Alice

062ef13 Add Bob.txt

9e06fca Add text to Alice

df3ad1d (origin/master, origin/HEAD, master) Add Alice

800a947 Add Tutorial Text

There's two things we'd like to fix about this, which for the sake of learning different things, will be a bit different than in the previous section on amend:

  • Put both changes to Alice.txt in a single commit
  • Consistently name things, and remove the .txt from the message about Bob.txt

To change the three new commits, we'll want to rebase onto the commit just before them. That commit for me is df3ad1d, but we can also reference it as the third commit from the current HEAD as HEAD~3

To start an interactive rebase we use git rebase -i {COMMIT}, so let's run git rebase -i HEAD~3

What you'll see is your editor of choice showing something like this:

pick 9e06fca Add text to Alice

pick 062ef13 Add Bob.txt

pick 0b22064 Add more text to Alice

# Rebase df3ad1d..0b22064 onto df3ad1d (3 commands)


# Commands:

# p, pick = use commit

# r, reword = use commit, but edit the commit message

# e, edit = use commit, but stop for amending

# s, squash = use commit, but meld into previous commit

# f, fixup = like "squash", but discard this commit's log message

# x, exec = run command (the rest of the line) using shell

# d, drop = remove commit


# These lines can be re-ordered; they are executed from top to bottom.


# If you remove a line here THAT COMMIT WILL BE LOST.


# However, if you remove everything, the rebase will be aborted.


# Note that empty commits are commented out

Note as always how git explains everything you can do right there when you call the command.

The Commands you'll probably be using most are reword, squash and drop. (And pick but that one's there by default)

Take a moment to think about what you see and what we're going to use to achieve our two goals from above. I'll wait.

Got a plan? Perfect!

Before we start making changes, take note of the fact, that the commits are listed from oldest to newest, and thus in the opposite direction of the git log output.

I'll start off with the easy change and make it so we get to change the commit message of the middle commit.

pick 9e06fca Add text to Alice

reword 062ef13 Add Bob.txt

pick 0b22064 Add more text to Alice

# Rebase df3ad1d..0b22064 onto df3ad1d (3 commands)


Now to getting the two changes of Alice.txt into one commit.

Obviously what we want to do is to squash the later of the two into the first one, so let's put that command in place of the pick on the second commit changing Alice.txt. For me in the example that's 0b22064.

pick 9e06fca Add text to Alice

reword 062ef13 Add Bob.txt

squash 0b22064 Add more text to Alice

# Rebase df3ad1d..0b22064 onto df3ad1d (3 commands)


Are we done? Will that do what we want?

It wont right? As the comments in the file tell us:

# s, squash = use commit, but meld into previous commit

So what we've done so far, will merge the changes of the second Alice commit, with the Bob commit. That's not what we want.

Another powerful thing we can do in an interactive rebase is changing the order of commits.

If you've read what the comments told you carefully, you already know how: Simply move the lines!

Thankfully you're in your favorite text editor, so go ahead and move the second Alice commit after the first.

pick 9e06fca Add text to Alice

squash 0b22064 Add more text to Alice

reword 062ef13 Add Bob.txt

# Rebase df3ad1d..0b22064 onto df3ad1d (3 commands)


That should do the trick, so close the editor to tell git to start executing the commands.

What happens next is just like a normal rebase: starting with the commit you've referenced when starting it, each of the commits you have listed will be applied one after the other.

Right now it won't happen, but when you re-order actual code changes, it may happen, that you run into conflicts during the rebase. After all you've possibly mixed up changes that were building on each other.> Just resolve them, as you would usually.
After applying the first commit, the editor will open and allow you to put a new message for the commit combining the changes to Alice.txt. I've thrown away the text of both commits and put "Add a lot of very important text to Alice".

After you close the editor to finish that commit, it will open again to allow you to change the message of the Add Bob.txt commit. Remove the ".txt" and continue by closing the editor.

That's it! You've rewritten history again. This time a lot more substantially than when amending!

If you look at the git log again, you'll see that there's two new commits in place of the three that we had previously. But by now you're used to what rebase does to commits and have expected that.

git log --oneline

105177b (HEAD -> interactiveRebase) Add Bob

ed78fa1 Add a lot very important text to Alice

df3ad1d (origin/master, origin/HEAD, master) Add Alice

800a947 Add Tutorial Text

Public History, why you shouldn't rewrite it, and how to still do it safely

As noted before, changing history is a incredibly useful part of any work-flow that involves making a lot of small commits while you work.

While all the small atomic changes make it very easy for you to e.g. verify that with each change your test-suite still passes and if it doesn't, remove or amend just these specific changes, the 100 commits you've made to write HelloWorld.java are probably not something you want to share with people.

Most likely what you want to share with them, are a few well formed changes with nice commit messages telling your colleagues what you did for which reason.

As long as all those small commits only exist in your Dev Environment, you're perfectly save to do a git rebase -i and change history to your hearts content.

Things get problematic when it comes to changing Public History. That means anything that has already made it to the Remote Repository.

At this point is has become public and other people's branches might be based on that history. That really makes it something you generally don't want to mess with.

The usual advice is to "Never rewrite public history!" and while I repeat that here, I've got to admit, that there is a decent amount of cases in which you might still want to rewrite public history.

In all of theses cases that history isn't 'really' public though. You most certainly don't want to go rewriting history on the master branch of an open source project, or something like your company's release branch.

Where you might want to rewrite history are branches that you've pushed just to share with some colleagues.

You might be doing trunk-based development, but want to share something that doesn't even compile yet, so you obviously don't want to put that on the main branch knowingly.

Or you might have a work-flow in which you share feature branches.

Especially with feature branches you hopefully rebase them onto the current master frequently. But as we know, a git rebase adds our branch's commits as new commits on top of the thing we're basing them on. This rewrites history. And in the case of a shared feature branch it rewrites public history.

So what should we do if we follow the "Never rewrite public history" mantra?

Never rebase our branch and hope it still merges into master in the end?

Not use shared feature branches?

Admittedly that second one is actually a reasonable answer, but you might still not be able to do that. So the only thing you can do, is to accept rewriting the public history and push the changed history to the Remote Repository.

If you just do a git push you'll be notified that you're not allowed to do that, as your local branch has diverged from the remote one.

You will need to force pushing the changes, and overwrite the remote with your local version.

As I've highlighted that so suggestively, you're probably ready to try git push --force right now. You really shouldn't do that if you want to rewrite public history safely though!

You're much better off using --force's more careful sibling --force-with-lease !

--force-with-lease will check if your local version of the remote branch and the actual remote match, before pushing.

By that you can ensure that you don't accidentally wipe any changes someone else may have pushed while you where rewriting history!

And on that note I'll leave you with a slightly changed mantra:

Don't rewrite public history unless you're really sure about what you're doing. And if you do, be safe and force-with-lease.

Reading history

Knowing about the differences between the areas in your Dev Environment - especially the Local Repository - and how commits and the history work, doing a rebase should not be scary to you.

Still sometimes things go wrong. You may have done a rebase and accidentally accepted the wrong version of file when resolving a conflict.

Now instead of the feature you've added, there's just your colleagues added line of logging in a file.

Luckily git has your back, by having a built in safety feature called the Reference Logs AKA reflog.

Whenever any reference like the tip of a branch is updated in your Local Repository a Reference Log entry is added.

So theres a record of any time you make a commit, but also of when you reset or otherwise move the HEAD etc.

Having read this tutorial so far, you see how this might come in handy when we've messed up a rebase right?

We know that a rebase moves the HEAD of our branch to the point we're basing it on and the applies our changes. An interactive rebase works similarly, but might do things to those commits like squashing or rewordingthem.

If you're not still on the branch on which we practiced interactive rebase, switch to it again, as we're about to practice some more there.

Lets have a look at the reflog of the things we've done on that branch by - you've guessed it - running git reflog.

You'll probably see a lot of output, but the first few lines on the top should be similar to this:

git reflog

105177b (HEAD -> interactiveRebase) [email protected]{0}: rebase -i (finish): returning to refs/heads/interactiveRebase

105177b (HEAD -> interactiveRebase) [email protected]{1}: rebase -i (reword): Add Bob

ed78fa1 [email protected]{2}: rebase -i (squash): Add a lot very important text to Alice

9e06fca [email protected]{3}: rebase -i (start): checkout HEAD~3

0b22064 [email protected]{4}: commit: Add more text to Alice

062ef13 [email protected]{5}: commit: Add Bob.txt

9e06fca [email protected]{6}: commit: Add text to Alice

df3ad1d (origin/master, origin/HEAD, master) [email protected]{7}: checkout: moving from master to interactiveRebase

There it is. Every single thing we've done, from switching to the branch to doing the rebase.

Quite cool to see the things we've done, but useless on it's own if we messed up somewhere, if it wasn't for the references at the start of each line.

If you compare the reflog output to when we looked at the log the last time, you'll see those points relate to commit references, and we can use them just like that.

Let's say we actually didn't want to do the rebase. How do we get rid of the changes it made?

We move HEAD to the point before the rebase started with a git reset 0b22064.

0b22064 is the commit before the rebase in my case. More generally you can also reference it as HEAD four changes ago via [email protected]{4}. Note that should you have switched branches in between or done any other thing that creates a log entry, you might have a higher number there.
If you take a look at the log now, you'll see the original state with three individual commits restored.

But let's say we now realize that's not what we wanted. The rebase is fine, we just don't like how we changed the message of the Bob commit.

We could just do another rebase -i in the current state, just like we did originally.

Or we use the reflog and jump back to after the rebase and amend the commit from there.

But by now you know how to do either of that, so I'll let you try that on your own. And in addition you also know that there's the reflog allowing you to undo most things you might end up doing by mistake.

Build Docker Images and Host a Docker Image Repository with GitLab

Build Docker Images and Host a Docker Image Repository with GitLab

In this tutorial, you'll learn how to build Docker images and host a Docker image repository with GitLab. We set up a new GitLab runner to build Docker images, created a private Docker registry to store them in, and updated a Node.js app to be built and tested inside of Docker containers.

In this tutorial, you'll learn how to build Docker images and host a Docker image repository with GitLab. We set up a new GitLab runner to build Docker images, created a private Docker registry to store them in, and updated a Node.js app to be built and tested inside of Docker containers.


Containerization is quickly becoming the most accepted method of packaging and deploying applications in cloud environments. The standardization it provides, along with its resource efficiency (when compared to full virtual machines) and flexibility, make it a great enabler of the modern DevOps mindset. Many interesting cloud native deployment, orchestration, and monitoring strategies become possible when your applications and microservices are fully containerized.

Docker containers are by far the most common container type today. Though public Docker image repositories like Docker Hub are full of containerized open source software images that you can docker pull and use today, for private code you'll need to either pay a service to build and store your images, or run your own software to do so.

GitLab Community Edition is a self-hosted software suite that provides Git repository hosting, project tracking, CI/CD services, and a Docker image registry, among other features. In this tutorial we will use GitLab's continuous integration service to build Docker images from an example Node.js app. These images will then be tested and uploaded to our own private Docker registry.


Before we begin, we need to set up a secure GitLab server, and a GitLab CI runner to execute continuous integration tasks. The sections below will provide links and more details.

A GitLab Server Secured with SSL

To store our source code, run CI/CD tasks, and host the Docker registry, we need a GitLab instance installed on an Ubuntu 16.04 server. GitLab currently recommends a server with at least 2 CPU cores and 4GB of RAM. Additionally, we'll secure the server with SSL certificates from Let's Encrypt. To do so, you'll need a domain name pointed at the server.

A GitLab CI Runner

Set Up Continuous Integration Pipelines with GitLab CI on Ubuntu 16.04 will give you an overview of GitLab's CI service, and show you how to set up a CI runner to process jobs. We will build on top of the demo app and runner infrastructure created in this tutorial.

Step 1 — Setting Up a Privileged GitLab CI Runner

In the prerequisite GitLab continuous integration tutorial, we set up a GitLab runner using sudo gitlab-runner register and its interactive configuration process. This runner is capable of running builds and tests of software inside of isolated Docker containers.

However, in order to build Docker images, our runner needs full access to a Docker service itself. The recommended way to configure this is to use Docker's official docker-in-docker image to run the jobs. This requires granting the runner a special privileged execution mode, so we'll create a second runner with this mode enabled.

Note: Granting the runner privileged mode basically disables all of the security advantages of using containers. Unfortunately, the other methods of enabling Docker-capable runners also carry similar security implications. Please look at the official GitLab documentation on Docker Build to learn more about the different runner options and which is best for your situation.

Read Also: How to Create Docker Image with MySQL Database

Because there are security implications to using a privileged runner, we are going to create a project-specific runner that will only accept Docker jobs on our hello_hapi project (GitLab admins can always manually add this runner to other projects at a later time). From your hello_hapi project page, click Settings at the bottom of the left-hand menu, then click CI/CD in the submenu:

Build Docker Images and Host a Docker Image Repository with GitLab

Now click the Expand button next to the Runners settings section:

Build Docker Images and Host a Docker Image Repository with GitLab

There will be some information about setting up a Specific Runner, including a registration token. Take note of this token. When we use it to register a new runner, the runner will be locked to this project only.

Build Docker Images and Host a Docker Image Repository with GitLab

While we're on this page, click the Disable shared Runners button. We want to make sure our Docker jobs always run on our privileged runner. If a non-privileged shared runner was available, GitLab might choose to use that one, which would result in build errors.

Log in to the server that has your current CI runner on it. If you don't have a machine set up with runners already, go back and complete the Installing the GitLab CI Runner Service

section of the prerequisite tutorial before proceeding.

Now, run the following command to set up the privileged project-specific runner:

    sudo gitlab-runner register -n \
      --url https://gitlab.example.com/ \
      --registration-token your-token \
      --executor docker \
      --description "docker-builder" \
      --docker-image "docker:latest" \


Registering runner... succeeded                     runner=61SR6BwV
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!

Be sure to substitute your own information. We set all of our runner options on the command line instead of using the interactive prompts, because the prompts don't allow us to specify --docker-privileged mode.

Your runner is now set up, registered, and running. To verify, switch back to your browser. Click the wrench icon in the main GitLab menu bar, then click Runners in the left-hand menu. Your runners will be listed:

Build Docker Images and Host a Docker Image Repository with GitLab

Now that we have a runner capable of building Docker images, let's set up a private Docker registry for it to push images to.

Read Also: Docker All The Things

Step 2 — Setting Up GitLab's Docker Registry

Setting up your own Docker registry lets you push and pull images from your own private server, increasing security and reducing the dependencies your workflow has on outside services.

GitLab will set up a private Docker registry with just a few configuration updates. First we'll set up the URL where the registry will reside. Then we will (optionally) configure the registry to use an S3-compatible object storage service to store its data.

SSH into your GitLab server, then open up the GitLab configuration file:

sudo nano /etc/gitlab/gitlab.rb

Scroll down to the Container Registry settings section. We're going to uncomment the registry_external_url line and set it to our GitLab hostname with a port number of 5555:


registry_external_url 'https://gitlab.example.com:5555'

Next, add the following two lines to tell the registry where to find our Let's Encrypt certificates:


registry_nginx['ssl_certificate'] = "/etc/letsencrypt/live/gitlab.example.com/fullchain.pem"
registry_nginx['ssl_certificate_key'] = "/etc/letsencrypt/live/gitlab.example.com/privkey.pem"

Save and close the file, then reconfigure GitLab:

sudo gitlab-ctl reconfigure


gitlab Reconfigured!

Update the firewall to allow traffic to the registry port:

sudo ufw allow 5555

Now switch to another machine with Docker installed, and log in to the private Docker registry. If you don’t have Docker on your local development computer, you can use whichever server is set up to run your GitLab CI jobs, as it has Docker installed already:

docker login gitlab.example.com:5555

You will be prompted for your username and password. Use your GitLab credentials to log in.

Login Succeeded 

Success! The registry is set up and working. Currently it will store files on the GitLab server's local filesystem. If you'd like to use an object storage service instead, continue with this section. If not, skip down to Step 3.

To set up an object storage backend for the registry, we need to know the following information about our object storage service:

  • Access Key
  • Secret Key
  • Region (us-east-1) for example, if using Amazon S3, or Region Endpoint if using an S3-compatible service ([https://nyc.digitaloceanspaces.com](https://nyc.digitaloceanspaces.com))
  • Bucket Name

If you're using DigitalOcean Spaces, you can find out how to set up a new Space and get the above information by reading How To Create a DigitalOcean Space and API Key.

When you have your object storage information, open the GitLab configuration file:

sudo nano /etc/gitlab/gitlab.rb

Once again, scroll down to the container registry section. Look for the registry['storage'] block, uncomment it, and update it to the following, again making sure to substitute your own information where appropriate:


registry['storage'] = {
  's3' => {
    'accesskey' => 'your-key',
    'secretkey' => 'your-secret',
    'bucket' => 'your-bucket-name',
    'region' => 'nyc3',
    'regionendpoint' => 'https://nyc3.digitaloceanspaces.com'

If you're using Amazon S3, you only need region and not regionendpoint. If you're using an S3-compatible service like Spaces, you'll need regionendpoint. In this case region doesn't actually configure anything and the value you enter doesn't matter, but it still needs to be present and not blank.

Save and close the file.

Note: There is currently a bug where the registry will shut down after thirty seconds if your object storage bucket is empty. To avoid this, put a file in your bucket before running the next step. You can remove it later, after the registry has added its own objects.

If you are using DigitalOcean Spaces, you can drag and drop to upload a file using the Control Panel interface.

Reconfigure GitLab one more time:

sudo gitlab-ctl reconfigure

On your other Docker machine, log in to the registry again to make sure all is well:

docker login gitlab.example.com:5555

You should get a Login Succeeded message.

Now that we've got our Docker registry set up, let's update our application's CI configuration to build and test our app, and push Docker images to our private registry.

Step 3 — Updating gitlab-ci.yaml and Building a Docker Image

Note: If you didn't complete the prerequisite article on GitLab CI you'll need to copy over the example repository to your GitLab server. Follow the Copying the Example Repository From GitHub section to do so.

To get our app building in Docker, we need to update the .gitlab-ci.yml file. You can edit this file right in GitLab by clicking on it from the main project page, then clicking the Edit button. Alternately, you could clone the repo to your local machine, edit the file, then git push it back to GitLab. That would look like this:

    git clone [email protected]:sammy/hello_hapi.git
    cd hello_hapi
    # edit the file w/ your favorite editor
    git commit -am "updating ci configuration"
    git push

First, delete everything in the file, then paste in the following configuration:


image: docker:latest
- docker:dind

- build
- test
- release

  TEST_IMAGE: gitlab.example.com:5555/sammy/hello_hapi:$CI_COMMIT_REF_NAME
  RELEASE_IMAGE: gitlab.example.com:5555/sammy/hello_hapi:latest

  - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN gitlab.example.com:5555

  stage: build
    - docker build --pull -t $TEST_IMAGE .
    - docker push $TEST_IMAGE

  stage: test
    - docker pull $TEST_IMAGE
    - docker run $TEST_IMAGE npm test

  stage: release
    - docker pull $TEST_IMAGE
    - docker tag $TEST_IMAGE $RELEASE_IMAGE
    - docker push $RELEASE_IMAGE
    - master

Be sure to update the highlighted URLs and usernames with your own information, then save with the Commit changes button in GitLab. If you're updating the file outside of GitLab, commit the changes and git push back to GitLab.

This new config file tells GitLab to use the latest docker image (image: docker:latest) and link it to the docker-in-docker service (docker:dind). It then defines build, test, and release stages. The build stage builds the Docker image using the Dockerfile provided in the repo, then uploads it to our Docker image registry. If that succeeds, the test stage will download the image we just built and run the npm test command inside it. If the test stage is successful, the release stage will pull the image, tag it as hello_hapi:latest and push it back to the registry.

Depending on your workflow, you could also add additional test stages, or even deploy stages that push the app to a staging or production environment.

Updating the configuration file should have triggered a new build. Return to the hello_hapi project in GitLab and click on the CI status indicator for the commit:

Build Docker Images and Host a Docker Image Repository with GitLab

On the resulting page you can then click on any of the stages to see their progress:

Build Docker Images and Host a Docker Image Repository with GitLab

Build Docker Images and Host a Docker Image Repository with GitLab

Eventually, all stages should indicate they were successful by showing green check mark icons. We can find the Docker images that were just built by clicking the Registry item in the left-hand menu:

Build Docker Images and Host a Docker Image Repository with GitLab

If you click the little "document" icon next to the image name, it will copy the appropriate docker pull ... command to your clipboard. You can then pull and run your image:

    docker pull gitlab.example.com:5555/sammy/hello_hapi:latest
    docker run -it --rm -p 3000:3000 gitlab.example.com:5555/sammy/hello_hapi:latest


> [email protected] start /usr/src/app
> node app.js

Server running at: http://56fd5df5ddd3:3000

The image has been pulled down from the registry and started in a container. Switch to your browser and connect to the app on port 3000 to test. In this case we're running the container on our local machine, so we can access it via localhost at the following URL:



Hello, test!

Success! You can stop the container with CTRL-C. From now on, every time we push new code to the master branch of our repository, we'll automatically build and test a new hello_hapi:latest image.


In this tutorial we set up a new GitLab runner to build Docker images, created a private Docker registry to store them in, and updated a Node.js app to be built and tested inside of Docker containers.

20+ Outstanding Vue.js Open Source Projects

20+ Outstanding Vue.js Open Source Projects

There are more than 20 Vue.js open-source projects in this article. The goal was to make this list as varied as possible.

There are more than 20 Vue.js open-source projects in this article. The goal was to make this list as varied as possible.

In this short intro, I won’t go back to the history of the Vue.js or some statistics on the use of this framework. Now it is a matter of fact that Vue is gaining popularity and projects listed below are the best evidence of its prevalence.

So here we go!


Opinionated code formatter

Website: https://prettier.io

Demo: https://prettier.io/playground/

**GitHub **:https://github.com/prettier/prettier

GitHub Stars: 32 343

Prettier reprints your code in a consistent style with several rules. Using a code formatter you don’t have to do it manually and argue about what is the right coding style anymore. This code formatter Integrates with most editors (Atom, Emacs, Visual Studio, Web Storm, etc. and works with all your favorite tools such as JavaScript, CSS, HTML, GraphQL, etc. And last year Prettier started to run in the browser and support .vue files.

Image source: https://prettier.io

Image source: https://prettier.io


Material Component Framework

Website: https://vuetifyjs.com/en/

GitHub: https://github.com/vuetifyjs/vuetify

GitHub Stars: 20 614

This framework allows you to customize visual components. It complies with Google Material Design guidelines. Vuetify combines all the advantages of Vue.js and Material. That is more Vuetify is constantly evolving because it has been improved by both communities on GitHub. This framework is compatible with RTL and Vue CLI-3. You can build an interactive and attractive frontend using Vuetify.

Image source: https://vuetifyjs.com/en/


A set of UI components

Website: https://iviewui.com/

GitHub: https://github.com/iview/iview

GitHub Stars: 21 643

Developers of all skill levels

can use iView but you have to be familiar with a Single File Components

(https://vuejs.org/v2/guide/single-file-components.html).. "https://vuejs.org/v2/guide/single-file-components.html).") A friendly API and

constant fixes and upgrades make it easy to use. You can use separate

components (navigation, charts, etc.) or you can use a Starter Kit. Solid documentation of the iView is a big plus and of course, it is compatible with the latest Vue.js. Please note that it doesn’t support IE8.

Image source: https://iviewui.com/

Image source: https://iviewui.com/


A tab page

GitHub: https://github.com/Alexays/Epiboard

GitHub Stars: 124

A tab page gives easy access to RSS feeds, weather, downloads, etc. Epiboard focuses on customizability to provide a user with a personalized experience. You can synchronize your settings across your devices, change the feel and look and add your favorite bookmarks. This project follows the guidelines of the material design. The full list of the current cards you can find on the GitHub page.

Image source: https://github.com/Alexays/Epiboard

Light Blue Vue Admin

Vue JS Admin Dashboard Template

Website: https://flatlogic.com/admin-dashboards/light-blue-vue-lite

Demo: https://flatlogic.com/admin-dashboards/light-blue-vue-lite/demo

GitHub: https://github.com/flatlogic/light-blue-vue-admin

GitHub Stars: 28

Light Blue is built with latest Vue.js and Bootstrap has detailed documentation and transparent and modern design. This template is easy to navigate, has user-friendly functions and a variety of UI elements. All the components of this template fit together impeccably and provide great user experience. Easy customization is another big plus, cuts dramatically development time.

Image source: https://flatlogic.com/admin-dashboards/light-blue-vue-lite

Image source: https://flatlogic.com/admin-dashboards/light-blue-vue-lite


Account Security Scanner

Website: https://beep.modus.app

GitHub: https://github.com/ModusCreateOrg/beep

GitHub Stars: 110

This security scanner was built with Vue.js and Ionic. It runs security checks and keeps passwords safe. So how this check is working? Beep simply compare your data with all the information in the leaked credentials databases. Your passwords are safe with Beep thanks to the use of the SHA-1 algorithm. Plus this app never stores your login and password as it is.

Image source: https://beep.modus.app

Sing App Vue Dashboard

Vue.JS admin dashboard template

Website: https://flatlogic.com/admin-dashboards/sing-app-vue-dashboard

Demo: https://flatlogic.com/admin-dashboards/sing-app-vue-dashboard/demo

GitHub: https://github.com/flatlogic/sing-app-vue-dashboard

GitHub Stars: 176

What do you need from an admin template? You definitely need classic look, awesome typography and the usual set of components. Sing App fits all these criteria plus it has a very soft color scheme. A free version of this template has all the necessary features to start your project with minimal work. This is an elegantly designed dashboard can be useful more the most of web apps like CMS, CRM or simple website admin panel.

Image source: https://flatlogic.com/admin-dashboards/sing-app-vue-dashboard

Image source: https://flatlogic.com/admin-dashboards/sing-app-vue-dashboard

Vue Storefront

PWA for the eCommerce

Website: https://www.vuestorefront.io

GitHub: https://github.com/DivanteLtd/vue-storefront

GitHub Stars: 5 198

This PWA storefront can connect almost with any backend for the eCommerce because it uses headless architecture. This includes popular BigCommerce platform, Magento, Shopware, etc. Vue Storefront isn’t easy to learn at once because it is a complex solution. But it gives you lots of possibilities and it always improving thanks to growing community of professionals. Some of the advantages of Vue Storefront include mobile-first approach, Server-Side Rendering (good for SEO) and offline mode.

Image source: https://www.vuestorefront.io

Cross-platform GUI client for DynamoDb

GitHub: https://github.com/Arattian/DynamoDb-GUI-Client

GitHub Stars: 178

DynamoDB is a NoSQL database applicable in cases where you have to deal with large amounts of data or serverless apps with AWS Lambda. This GUI client gives remote access plus supports several databases at the same time.

Image source: https://github.com/Arattian/DynamoDb-GUI-Client


Interactive organization chart

Demo: https://hoogkamer.github.io/vue-org-chart/#/

GitHub: https://github.com/Hoogkamer/vue-org-chart

GitHub Stars: 44

With this solution, no webserver, install or database needed. This simple chart can be edited in excel or webpage. You can easily search for a particular manager or department. Also, there are two options for usage. First as a static website. This option is suitable for you if you want to use vueOrgChart without modification. If you are planning to build your own chart on top of this project you will have to study the section “Build Setup”.

Image source: https://hoogkamer.github.io/vue-org-chart/#/


Favicon generator

Website: https://www.faviator.xyz

Demo: https://www.faviator.xyz/playground

GitHub: https://github.com/faviator/faviator

GitHub Stars: 63

This library helps you to create a simple icon. The first step is to pass in a configuration, and second, choose the format of your icon. You can choose JPG, PNG or SVG format. As you can see in the screenshot you can choose any font from the Google Fonts.

Image source: https://www.faviator.xyz

Minimal Notes

Web app for PC or Tablet

Demo: https://vladocar.github.io/Minimal-Notes/

GitHub: https://github.com/vladocar/Minimal-Notes

GitHub Stars: 48

There is not much to say about this app. It is minimalistic, works on a browser locally, stored in localStorage and the file is only 4Kb. It is also available for Mac OS but the file form 4KB becomes 0.45 Bb. But it is still very lightweight.

Image source: https://vladocar.github.io/Minimal-Notes/


CMS built with Vue.js

Website: https://directus.io

Demo: https://directus.app/?ref=madewithvuejs.com#/login

GitHub: https://github.com/directus/directus

GitHub Stars: 4 607

Directus is a very lightweight and simple CMS. It has been modularized to give the developers the opportunity to customize it in every aspect. The main peculiarity of this CMS is that it stores your data in SQL databases so it can stay synchronized with every change you made and be easily customized. It also supports multilingual content.

Image source: https://directus.io


Static Site Generator

Website: https://vuepress.vuejs.org

GitHub: https://github.com/vuejs/vuepress

GitHub Stars: 12 964

The creator of Vue.js, Evan You, created this simple site generator. Minimalistic and SEO friendly it has multi-language support and easy Google Analytics integration. A VuePress site is using VueRouter, Vue, and webpack. If you worked with the Nuxt or Gatsby you will notice some familiarities. The only difference is that Nuxt was created to develop applications and VuePress is for building static websites.

Image source: https://vuepress.vuejs.org


Documentation site generator

Website: https://docsify.js.org/#/

GitHub: https://github.com/docsifyjs/docsify

GitHub Stars: 10 105

This project has an impressive showcase list. The main peculiarity of this generator lies in the way pages are generated. It simply grabs you Markdown file and displays it as a page of your site. Another big plus of this project is a full-text search and API plugins. It supports multiple themes and really lightweight.

Image source: https://docsify.js.org/#/


Standard Tooling for Vue.js Development

Website: https://cli.vuejs.org

GitHub: https://github.com/vuejs/vue-cli

GitHub Stars: 21 263

This well-known tooling was released by the Vue team. Please note that before starting to use it you should install the latest version of Vue.js, Node.js, NPM, and a code editor. Vue CLI has a GUI tool and instant prototyping. Instant prototyping is a relatively new feature. It allows you to create a separate component. And this component will have all “vue powers” as full Vue.js project.

Image source: https://cli.vuejs.org


Spreadsheet Parser and Writer

Website: https://sheetjs.com/

Demo: https://sheetjs.com/demos

GitHub: https://github.com/SheetJS/js-xlsx

GitHub Stars: 16 264

SheetJS is a JS library that helps you to operate data stored in excel file. For example, you can export a workbook on browser-side or convert any HTML table. In other words, SheetJS doesn’t involve a server-side script, or for example AJAX. This the best solution for front-end operation of two-dimensional tables. It can export and parse data and run in node terminal or browser side.

Image source: https://sheetjs.com/


Browser devtools extension

GitHub: https://github.com/vuejs/vue-devtools

GitHub Stars: 13 954

Almost any framework provides developers with a suitable devtool. This is literally an additional panel in the browser which very differs from the standard one. You don’t have to install it as a browser extension. There is an option to install it as a standalone application. You can activate it by right-click the element and choose “Inspect Vue component” and navigate the tree of components. The left menu of this tool will show you the data and the props of the component.

Image source: https://github.com/vuejs/vue-devtools


Data Grid Component

Website: https://handsontable.com

GitHub: https://github.com/handsontable/handsontable

GitHub Stars: 12 049

This component has a spreadsheet look, can be easily modified with a plugin and binds to almost any data source. It supports all the standard operations like read, delete, update and create. Plus you can sort and filter your records. What is more, you can include data summaries and assign a type to a cell. This project has exemplary documentation and was designed as customizable as it needs to be.

Image source: https://handsontable.com

Vue webpack boilerplate

Website: http://vuejs-templates.github.io/webpack/

GitHub: https://github.com/vuejs-templates/webpack

GitHub Stars: 9 052

Vue.js provides great templates to help you start the development process with your favorite stack. This boilerplate is a solid foundation for your project. It includes the best project structure and configuration, optimal tools and best development practices. Make sure this template has more or less the same features that you need for your project. Otherwise, it is better to use Vue CLI due to its flexibility.

Image source: http://vuejs-templates.github.io/webpack/

Material design for Vue.js

Website: http://vuematerial.io

GitHub: https://github.com/vuematerial/vue-material

GitHub Stars: 7 984

What is great about this Material Design Framework is truly thorough documentation. The framework is very lightweight with a full array of components and fully in line with the Google Material Design guidelines. This design fits every screen and supports every modern browser.

Image source: http://vuematerial.io


Click-to-copy CSS effects

Website: https://cssfx.dev

GitHub: https://github.com/jolaleye/cssfx

GitHub Stars: 4 569

This project is very simple and does exactly what is said in the description line. It’s a collection of CSS effects. You can see a preview of each effect and click on it. You will see a pop up with a code snippet that you can copy.

Image source: https://cssfx.dev


Website: http://uigradients.com/


GitHub Stars: 4 323

This is a collection of linear gradients which allows you to copy CSS codes. The collection is community contributed and has the opportunity to filter gradients based on preferred color.

Image source: http://uigradients.com/


Demo: https://vuestic.epicmax.co/#/admin/dashboard

GitHub: https://github.com/epicmaxco/vuestic-admin

GitHub Stars: 5 568

Vuestic is a responsive admin template that already proving popular at the GitHub. Made with Bootstrap 4 this template doesn’t require jQuery. With 36 UI ready-to-use elements and 18 pages, Vuestic offers multiple options for customization. The code is constantly evolving not only due to the efforts of the author but also because of the support of the Vue community on GitHub.

Image source: https://vuestic.epicmax.co/#/admin/dashboard

Kubernetes Tutorial: How to deploy Gitea using the Google Kubernetes Engine

Kubernetes Tutorial: How to deploy Gitea using the Google Kubernetes Engine

In his tutorial will go over how to deploy Gitea, an open-source git hosting service, using the Google Kubernetes Engine.

Originally published by Daniel Sanche at https://medium.com

If you’ve read "An Introduction to Kubernetes", you should have a good foundational understanding of the basic pieces that make up Kubernetes. If you’re anything like me, however, you won’t fully understand a concept until you get hands on with it. 

There’s nothing too special about Gitea specifically, but going through the process of deploying an arbitrary open source application to the cloud will give us some practical hands-on experience with using Kubernetes. Plus, at the end you will be left with a great self-hosted service you can use to host your future projects!

Setting Up a Cluster

kubectl and gcloud

The most important tool you use when setting up a Kubernetes environment is the kubectl command. This command allows you to interact with the Kubernetes API. It is used to create, update, and delete Kubernetes resources like pods, deployments, and load balancers.

There is a catch, however: kubectl can’t be used to directly provision the nodes or clusters your pods are run on. This is because Kubernetes was designed to be platform agnostic. Kubernetes doesn’t know or care where it is running, so there is no built in way for it to communicate with your chosen cloud provider to rent nodes on your behalf. Because we are using Google Kubernetes Engine for this tutorial, we will need to use the gcloud command for these tasks.

In brief, gcloud is used to provision the resources listed under “Hardware”, and kubectl is used to manage the resources listed under “Software”

This tutorial assumes you already have kubectl and gcloud installed on your system. If you’re starting completely fresh, you will first want to check out the first part of the Google Kubernetes Engine Quickstart to sign up for a GCP account, set up a project, enable billing, and install the command line tools.

Once you have your environment ready to go, you can create a cluster by running the following commands:

# create the cluster
by default, 3 standard nodes are created for our cluster

gcloud container clusters create my-cluster --zone us-west1-a# get the credentials so we can manage it locally through kubectl

creating a cluster can take a few minutes to complete

gcloud container clusters get-credentials my-cluster \

     --zone us-west1-a

We now have a provisioned cluster made up of three n1-standard1 nodes

Along with the gcloud command, you can manage your resources through the Google Cloud Console page. After running the previous commands, you should see your cluster appear under the GKE section. You should also see a list of the VMs provisioned as your nodes under the GCE section. Note that although the GCE UI allows you to delete the VMs from this page, they are being managed by your cluster, which will re-create them when it notices they are missing. When you are finished with this tutorial and want to permanently remove the VMs, you can remove everything at once by deleting the cluster itself.

Deploying An App

YAML: Declarative Infrastructure

Now that our cluster is live, it’s time to put it to work. There are two ways to add resources to Kubernetes: interactively through the command line using kubectl add, and declaratively, by defining resources in YAML files

While interactive deployment with kubectl add is great for experimenting, YAML is the way to go when you want to build something maintainable. By writing all of your Kubernetes resources into YAML files, you can record the entire state of your cluster in a set of easily maintainable files, which can be version-controlled and managed like any other part of your system. In this way, all the instructions needed to host your service can be saved right alongside the code itself.

Adding a Pod

To show a basic example of what a Kubernetes YAML file looks like, let’s add a pod to our cluster. Create a new file called gitea.yaml and fill it with the following text:

apiVersion: v1
kind: Pod
name: gitea-pod

  • name: gitea-container
    image: gitea/gitea:1.4

This pod is fairly basic. Line 2 declares that the type of resource we are creating is a pod; line 1 says that this resource is defined in v1 of the Kubernetes API. Lines 3–8 describe the properties of our pod. In this case, the pod is unoriginally named “gitea-pod”, and it contains a single container we’re calling “gitea-container”.

Line 8 is the most interesting part. This line defines which container image we want to run; in this case, the image tagged 1.4 in the gitea/gitea repository. Kubernetes will tell the built-in container runtime to find the requested container image, and pull it down into the pod. Because the default container runtime is Docker, it will find the gitea repository hosted on Dockerhub, and pull down the requested image.

Now that we have the YAML written out, we apply it to our cluster:kubectl apply -f gitea.yaml

This command will cause Kubernetes to read our YAML file, and update any resources in our cluster accordingly. To see the newly created pod in action, you can run kubectl get pods. You should see the pod running.

$ kubectl get podsNAME        READY     STATUS    RESTARTS   AGE

gitea-pod   1/1       Running   0          9m

Gitea is now running in a pod the cluster

If you want even more information, you can view the standard output of the container with the following command:

$ kubectl logs -f gitea-podGenerating /data/ssh/ssh_host_ed25519_key...

Feb 13 21:22:00 syslogd started: BusyBox v1.27.2

Generating /data/ssh/ssh_host_rsa_key...

Generating /data/ssh/ssh_host_dsa_key...

Generating /data/ssh/ssh_host_ecdsa_key...

/etc/ssh/sshd_config line 32: Deprecated option UsePrivilegeSeparation

Feb 13 21:22:01 sshd[12]: Server listening on :: port 22.

Feb 13 21:22:01 sshd[12]: Server listening on port 22.

2018/02/13 21:22:01 [T] AppPath: /app/gitea/gitea

2018/02/13 21:22:01 [T] AppWorkPath: /app/gitea

2018/02/13 21:22:01 [T] Custom path: /data/gitea

2018/02/13 21:22:01 [T] Log path: /data/gitea/log

2018/02/13 21:22:01 [I] Gitea v1.4.0+rc1-1-gf61ef28 built with: bindata, sqlite

2018/02/13 21:22:01 [I] Log Mode: Console(Info)

2018/02/13 21:22:01 [I] XORM Log Mode: Console(Info)

2018/02/13 21:22:01 [I] Cache Service Enabled

2018/02/13 21:22:01 [I] Session Service Enabled

2018/02/13 21:22:01 [I] SQLite3 Supported

2018/02/13 21:22:01 [I] Run Mode: Development

2018/02/13 21:22:01 Serving [::]:3000 with pid 14

2018/02/13 21:22:01 [I] Listen:

As you can see, there is now a server running inside the container on our cluster! Unfortunately, we won’t be able to access it until we start opening up ingress channels (coming in a future post).


As explained in Kubernetes Tutorial, pods aren’t typically run directly in Kubernetes. Instead, we should define a deployment to manage our pods.

First, let’s delete the pod we already have running:

kubectl delete -f gitea.yaml

This command removes all resources defined in the YAML file from the cluster. We can now modify our YAML file to look like this:

apiVersion: extensions/v1beta1
kind: Deployment
name: gitea-deployment
replicas: 1
app: gitea
app: gitea
- name: gitea-container
image: gitea/gitea:1.4

This one looks a bit more complicated than the pod we made earlier. That’s because we are really defining two different objects here: the deployment itself (lines 1–9), and the template of the pod it is managing (lines 10–17).

Line 6 is the most important part of our deployment. It defines the number of copies of the pods we want running. In this example, we are only requesting one copy, because Gitea wasn’t designed with multiple pods in mind.²

There is one other new concept introduced here: labels and selectors. Labels are simply user-defined key-value stores associated with Kubernetes resources. Selectors are used retrieve the resources that match a given label query. In this example, line 13 assigns the label “app=gitea” to all pods created by this deployment. Now, if the deployment ever needs to retrieve the list of all pods that it created (to make sure they are all healthy, for example) it will use the selector defined on lines 8–9. In this way, the deployment can always keep track of which pods it manages by searching for which ones have been assigned the “app=gitea” label.

For the most part, labels are user-defined. In the example above, “app” doesn’t mean anything special to Kubernetes, it is just a way that we may find useful to organize our system. Having said that, there are certain labels that are automatically applied by Kubernetes, containing information about the system.

Now that we have created our new YAML file, we can re-apply it to our cluster:

kubectl apply -f gitea.yaml

Now, our pod is managed by a deployment

Now, if we run kubectl get pods we can now our new pods running, as specified in our deployment:

$ kubectl get podsNAME                              READY    STATUS    RESTARTS
gitea-deployment-8944989b8-5kmn2  0/1      Running   0

We can see information about the deployment itself:

$ kubectl get deploymentsNAME              DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
gitea-deployment  1        1        1           1          4m

To test to make sure everything’s working, try deleting the pod with kubectl delete pod <pod_name>. You should quickly see a new one pop back in it’s place. That’s the magic of deployments!

You may have noticed that the new pod has weird, partially randomly generated name. That’s because pods are now created in bulk by the deployment, and are meant to be ephemeral. When wrapped in a deployment, pods should be thought of as cattle rather than pets.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow me on Facebook | Twitter

Further reading about Kubernetes

Docker and Kubernetes: The Complete Guide

Learn DevOps: The Complete Kubernetes Course

Docker and Kubernetes: The Complete Guide

Kubernetes Certification Course with Practice Tests

An illustrated guide to Kubernetes Networking

An Introduction to Kubernetes: Pods, Nodes, Containers, and Clusters

An Introduction to the Kubernetes DNS Service

Kubernetes Deployment Tutorial For Beginners

Kubernetes Tutorial - Step by Step Introduction to Basic Concepts

Git and Github: A Beginner’s Guide for Absolute Beginners

Git and Github: A Beginner’s Guide for Absolute Beginners

If you are a developer and you want to get started with Git and GitHub, then this article is made for you.

What is Git?

Git is a free, open-source version control software. It was created by Linus Torvalds in 2005. This tool is a version control system that was initially developed to work with several developers on the Linux kernel.

This basically means that Git is a content tracker. So Git can be used to store content — and it is mostly used to store code because of the other features it provides.

Real life projects generally have multiple developers working in parallel. So they need a version control system like Git to make sure that there are no code conflicts between them.

Also, the requirements in such projects change often. So a version control system allows developers to revert and go back to an older version of their code.

The branch system in Git allows developers to work individually on a task (For example: One branch -> One task OR One branch -> One developer). Basically think of Git as a small software application that controls your code base, if you’re a developer.

Git Repositories

If we want to start using Git, we need to know where to host our repositories.

A repository (or “Repo” for short) is a project that contains multiple files. In our case a repository will contain code-based files.

There are two ways you can host your repositories. One is online (on the cloud) and the second is offline (self-installed on your server).

There are three popular Git hosting services: GitHub (owned by Microsoft), GitLab (owned by GitLab) and BitBucket. We’ll use GitHub as our hosting service.

Before using Git we should know why we need it

Git makes it easy to contribute to open source projects

Nearly every open-source project uses GitHub to manage their projects. Using GitHub is free if your project is open source, and it includes a wiki and issue tracker that makes it easy to include more in-depth documentation and get feedback about your project.

If you want to contribute, you just fork (get a copy of) a project, make your changes, and then send the project a pull request using GitHub's web interface. This pull request is your way of telling the project you're ready for them to review your changes.


By using GitHub, you make it easier to get excellent documentation. Their help section and guides have articles for nearly any topic related to Git that you can think of.

Integration options

GitHub can integrate with common platforms such as Amazon and Google Cloud, with services such as Code Climate to track your feedback, and can highlight syntax in over 200 different programming languages.

Track changes in your code across versions

When multiple people collaborate on a project, it’s hard to keep track of revisions — who changed what, when, and where those files are stored.

GitHub takes care of this problem by keeping track of all the changes that have been pushed to the repository.

Much like using Microsoft Word or Google Drive, you can have a version history of your code so that previous versions are not lost with every iteration. It’s easy to come back to the previous version and contribute your work.

Showcase your work

Are you a developer who wishes to attract recruiters? GitHub is the best tool you can rely on for this.

Today, when searching for new recruits for their projects, most companies look at GitHub profiles. If your profile is available, you will have a higher chance of being recruited even if you are not from a great university or college.

Now we’ll learn how to use Git & GitHub

GitHub account creation

To create your account, you need to go to GitHub's website and fill out the registration form.

Git installation

Now we need to install Git's tools on our computer. We’ll use CLI to communicate with GitHub.

For Ubuntu:

  1. First, update your packages.
sudo apt update

2. Next, install Git and GitHub with apt-get

sudo apt-get install git

3. Finally, verify that Git is installed correctly

git --version

4. Run the following commands with your information to set a default username and email when you’re going to save your work.

git config --global user.name "MV Thanoshan"
git config --global user.email "[email protected]"
Working with GitHub projects

We’ll work with GitHub projects in two ways.

Type 1: Create the repository, clone it to your PC, and work on it.(Recommended)

Type 1 involves creating a totally fresh repository on GitHub, cloning it to our computer, working on our project, and pushing it back.

Create a new repository by clicking the “new repository” button on the GitHub web page.

Pick a name for your first repository, add a small description, check the ‘Initialize this repository with a README’ box, and click on the “Create repository” button.

Well done! Your first GitHub repository is created.

Your first mission is to get a copy of the repository on your computer. To do that, you need to “clone” the repository on your computer.

To clone a repository means that you're taking a repository that’s on the server and cloning it to your computer – just like downloading it. On the repository page, you need to get the “HTTPS” address.

Once you have the address of the repository, you need to use your terminal. Use the following command on your terminal. When you’re ready you can enter this:

git clone [HTTPS ADDRESS]

This command will make a local copy of the repository hosted at the given address.

Now, your repository is on your computer. You need to move in it with the following command.


As you can see in the above picture, my repository name is “My-GitHub-Project” and this command made me go to that specific directory.

**NOTE:**When you clone, Git will create a repository on your computer. If you want, you can access your project with the computer user interface instead using the above ‘cd’ command on the terminal.

Now, in that folder we can create files, work on them, and save them locally. To save them in a remote place — like GitHub – we have do a process called a “commit”. To do this, get back to your terminal. If you closed it, like I previously stated, use the ‘cd’ command.


Now, in the terminal, you’re in your repository directory. There are 4 steps in a commit: ‘status’ , ‘add’ , ‘commit’ and ‘push’. All the following steps must be performed within your project. Let's go through them one by one.

  1. “status”: The first thing you need to do is to check the files you have modified. To do this, you can type the following command to make a list of changes appear.
git status

2. “add”: With the help of the change list, you can add all files you want to upload with the following command,

git add [FILENAME] [FILENAME] [...]

In our case, we’ll add a simple HTML file.

git add sample.html

3. “commit”: Now that we have added the files of our choice, we need to write a message to explain what we have done. This message may be useful later if we want to check the change history. Here is an example of what we can put in our case.

git commit -m "Added sample HTML file that contain basic syntax"

4. “push”: Now we can put our work on GitHub. To do that we have to ‘push’ our files to Remote. Remote is a duplicate instance of our repository that lives somewhere else on a remote server. To do this, we must know the remote’s name (Mostly remote is named origin). To figure out that name, type the following command.

git remote

As you can see in the above image, it says that our remote’s name is origin. Now we can safely ‘push’ our work by the following command.

git push origin master

Now, if we go to our repository on the GitHub web page, we can see the sample.html file that we’ve pushed to remote — GitHub!

NOTE: Sometimes when you’re using Git commands in the terminal, it can lead you to the VIM text editor (a CLI based text-editor). So to get rid of it, you have to type


and ENTER.

Pulling is the act of receiving from GitHub.

Pushing is the act of sending to GitHub.

Type 2: Work on your project locally then create the repository on GitHub and push it to remote.

Type 2 lets you make a fresh repository from an existing folder on our computer and send that to GitHub. In a lot of cases you might have actually already made something on your computer that you want to suddenly turn into a repository on GitHub.

I will explain this to you with a Survey form web project that I made earlier that wasn’t added to GitHub.

As I already mentioned, when executing any Git commands, we have to make sure that we are in the correct directory in the terminal.

By default, any directory on our computer is not a Git repository – but we can turn it into a Git repository by executing the following command in the terminal.

git init

After converting our directory to a Git repository, the first thing we need to do is to check the files we have by using the following command.

git status

So there are two files in that directory that we need to “add” to our Repo.

git add [FILENAME] [FILENAME] [...]

NOTE: To “add” all of the files in our Repository we can use the following command:

git add .

After the staging area (the add process) is complete, we can check whether the files are successfully added or not by executing the git status

If those particular files are in green like the below picture, you’ve done your work!

Then we have to “commit” with a description in it.

git commit -m "Adding web Survey form"

If my repository started on GitHub and I brought it down to my computer, a remote is already going to be attached to it (Type 1). But if I’m starting my repository on my computer, it doesn’t have a remote associated with it, so I need to add that remote (Type 2).

So to add that remote, we have to go to GitHub first. Create a new repository and name it whatever you want to store it in GitHub. Then click the “Create repository” button.

NOTE: In Type 2, Please don’t initialize the repository with a README file when creating a new repository on the GitHub web page.

After clicking the “Create repository” button you’ll find the below image as a web page.

Copy the HTTPS address. Now we’ll create the remote for our repository.

git remote add origin [HTTPS ADDRESS]

After executing this command, we can check whether we have successfully added the remote or not by the following command

git remote

And if it outputs “origin” you’ve added the remote to your project.

NOTE: Just remember we can state any name for the remote by changing the name “origin”. For example:

git remote add [REMOTE NAME] [HTTPS ADDRESS]

Now, we can push our project to GitHub without any problems!

git push origin master

After completing these steps one by one, if you go to GitHub you can find your repository with the files!


Thank you everyone for reading. I just explained the basics of Git and GitHub. I strongly encourage you all to read more related articles on Git and GitHub. I hope this article helped you.

Happy Coding!

Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis

Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis

<strong>Originally published by </strong><a href="https://towardsdatascience.com/@7genblogger" target="_blank">David Herron</a><strong> </strong><em>at&nbsp;</em><a href="https://towardsdatascience.com/why-git-and-git-lfs-is-not-enough-to-solve-the-machine-learning-reproducibility-crisis-f733b49e96e8" target="_blank">towardsdatascience.com</a>

Some claim the machine learning field is in a crisis due to software tooling that’s insufficient to ensure repeatable processes. The crisis is about difficulty in reproducing results such as machine learning models. The crisis could be solved with better software tools for machine learning practitioners.

The reproducibility issue is so important that the annual NeurIPS conference plans to make this a major topic of discussion at NeurIPS 2019. The “Call for Papers” announcement has more information https://medium.com/@NeurIPSConf/call-for-papers-689294418f43

The so-called crisis is because of the difficulty in replicating the work of co-workers or fellow scientists, threatening their ability to build on each other’s work or to share it with clients or to deploy production services. Since machine learning, and other forms of artificial intelligence software, are so widely used across both academic and corporate research, replicability or reproducibility is a critical problem.

We might think this can be solved with typical software engineering tools, since machine learning development is similar to regular software engineering. In both cases we generate some sort of compiled software asset for execution on computer hardware hoping to get accurate results. Why can’t we tap into the rich tradition of software tools, and best practices for software quality, to build repeatable processes for machine learning teams?

Unfortunately traditional software engineering tools do not fit well with the needs of machine learning researchers.

A key issue is the training data. Often this is a large amount of data, such as images, videos, or texts, that are fed into machine learning tools to train an ML model. Often the training data is not under any kind of source control mechanism, if only because systems like Git do not deal well with large data files, and source control management systems designed to generate delta’s for text files do not deal well with changes to large binary files. Any experienced software engineer will tell you that a team without source control will be in a state of barely managed chaos. Changes won’t always be recorded and team members might forget what was done.

At the end of the day that means a model trained against the training data cannot be replicated because the training data set will have changed in unknown-able ways. If there is no software system to remember the state of the data set on any given day, then what mechanism is there to remember what happened when?

Git-LFS is your solution, right?

The first response might be to simply use Git-LFS (Git Large File Storage) because it, as the name implies, deals with large files while building on Git. The pitch is that Git-LFS “replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.” One can just imagine a harried machine learning team saying “sounds great, let’s go for it”. It handles multi-gigabyte files, speeds up checkout from remote repositories, and uses the same comfortable workflow. That sure ticks a lot of boxes, doesn’t it?

Not so fast, didn’t your manager instruct you to evaluate carefully before jumping in with both feet? Another life lesson to recall is to look both ways before crossing the street.

The first thing your evaluation should turn up is that Git-LFS requires an LFS server, and that server is not available through every Git hosting service. The big three (Github, Gitlab and Atlassian) all support Git-LFS, but maybe you have a DIY bone in your body. Instead of using a 3rd party Git hosting service, you might prefer to host your own Git service. Gogs, for example, is a competent Git service you can easily run on your own hardware, but it does not have built-in support for Git-LFS.

Depending on your data needs this next could be a killer: Git LFS lets you store files up to 2 GB in size. That is a Github limitation rather than Git-LFS limitation, however all Git-LFS implementations seem to come with various limitations. Gitlab and Atlassian both have their own lists of Git-LFS limitations. Consider this 2GB limit from Github: One of the use-cases in the Git-LFS pitch is storing video files, but isn’t it common for videos to be way beyond 2GB in size? Therefore GIt-LFS on Github is probably unsuitable for machine learning datasets.

It’s not just the 2GB file size limit, but Github places such a tight limit on the free tier of Git-LFS use that one must purchase a data plan covering both data and bandwidth usage.

An issue related to bandwidth is that when using a hosted Git-LFS solution, your training data is stored in a remote server and must be downloaded over the Internet. The time to download training data is a serious user experience problem.

Another issue is the ease of placing data files on a cloud storage system (AWS, GCP, etc) as is often required when to run cloud-based AI software. This is not supported, since the main Git-LFS offerings from the big 3 Git services store your LFS files on their server. There is a DIY Git-LFS server that does store files on AWS S3 at https://github.com/meltingice/git-lfs-s3 But setting up a custom Git-LFS server of course requires additional work. And, what if you need the files to be on GCP instead of AWS infrastructure? Is there a Git-LFS server which stores data on the cloud storage platform of your choice? Is there a Git-LFS server that utilizes a simple SSH server? In other words, GIt-LFS limits your choices of where the data is stored.

Does using Git-LFS solve the so-called Machine Learning Reproducibility Crisis?

With Git-LFS your team has better control over the data, because it is now version controlled. Does that mean the problem is solved?

Earlier we said the “key issue is the training data”, but that was a lie. Sort of. Yes keeping the data under version control is a big improvement. But is the lack of version control of the data files the entire problem? No.

What determines the results of training a model or other activities? The determining factors include the following, and perhaps more:

  • Training data — the image database or whatever data source is used in training the model
  • The scripts used in training the model
  • The libraries used by the training scripts
  • The scripts used in processing data
  • The libraries or other tools used in processing data
  • The operating system and CPU/GPU hardware
  • Production system code
  • Libraries used by production system code

Obviously the result of training a model depends on a variety of conditions. Since there are so many variables to this, it is hard to be precise, but the general problem is a lack of what’s now called Configuration Management. Software engineers have come to recognize the importance of being able to specify the precise system configuration used in deploying systems.

Solutions to machine learning reproducibility

Humans are an inventive lot, and there are many possible solutions to this “crisis”.

Environments like R Studio or Jupyter Notebook offer a kind of interactive Markdown document which can be configured to execute data science or machine learning workflows. This is useful for documenting machine learning work, and specifying which scripts and libraries are used. But these systems do not offer a solution to managing data sets.

Likewise, Makefiles and similar workflow scripting tools offer a method to repeatedly execute a series of commands. The executed commands are determined through file-system time stamps. These tools offer no solution for data management.

At the other end of the scale are companies like Domino Data Labs or C3 IoT offering hosted platforms for data science and machine learning. Both package together an offering built upon a wide swath of data science tools. In some cases, like C3 IoT, users are coding in a proprietary language and storing their data in a proprietary data store. It can be enticing to use a one-stop-shopping service, but will it offer the needed flexibility?

In the rest of this article we’ll discuss DVC. It was designed to closely match Git functionality, to leverage the familiarity most of us have with Git, but with features making it work well for both workflow and data management in the machine learning context.

DVC (https://dvc.org) takes on and solves a larger slice of the machine learning reproducibility problem than does Git-LFS or several other potential solutions. It does this by managing the code (scripts and programs), alongside large data files, in a hybrid between DVC and a source code management (SCM) system like Git. In addition DVC manages the workflow required for processing files used in machine learning experiments. The data files and commands-to-execute are described in DVC files which we’ll learn about in the following sections. Finally, with DVC it is easy to store data on many storage systems from the local disk, to an SSH server, or to cloud systems (S3, GCP, etc). Data managed by DVC can be easily shared with others using this storage system.

Image courtesy dvc.org

DVC uses a similar command structure to Git. As we see here, just like git push and git pull are used for sharing code and configuration with collaborators, dvc push and dvc pull is used for sharing data. All this is covered in more detail in the coming sections, or if you want to skip right to learning about DVC see the tutorial at https://dvc.org/doc/tutorial.

DVC remembers precisely which files were used at what point of time

At the core of DVC is a data store (the DVC cache) optimized for storing and versioning large files. The team chooses which files to store in the SCM (like Git) and which to store in DVC. Files managed by DVC are stored such that DVC can maintain multiple versions of each file, and to use file-system links to quickly change which version of each file is being used.

Conceptually the SCM (like Git) and DVC both have repositories holding multiple versions of each file. One can check out “version N” and the corresponding files will appear in the working directory, then later check out “version N+1” and the files will change around to match.

Image courtesy dvc.org

On the DVC side, this is handled in the DVC cache. Files stored in the cache are indexed by a checksum (MD5 hash) of the content. As the individual files managed by DVC change, their checksum will of course change, and corresponding cache entries are created. The cache holds all instances of each file.

For efficiency, DVC uses several linking methods (depending on file system support) to insert files into the workspace without copying. This way DVC can quickly update the working directory when requested.

DVC uses what are called “DVC files” to describe both the data files and the workflow steps. Each workspace will have multiple DVC files, with each describing one or more data files with the corresponding checksum, and each describing a command to execute in the workflow.

cmd: python src/prepare.py data/data.xml
- md5: b4801c88a83f3bf5024c19a942993a48
  path: src/prepare.py
- md5: a304afb96060aad90176268345e10355
  path: data/data.xml
md5: c3a73109be6c186b9d72e714bcedaddb
- cache: true
  md5: 6836f797f3924fb46fcfd6b9f6aa6416.dir
  metric: false
  path: data/prepared
wdir: .

This example DVC file comes from the DVC Getting Started example (https://github.com/iterative/example-get-started) and shows the initial step of a workflow. We’ll talk more about workflows in the next section. For now, note that this command has two dependencies, src/prepare.py and data/data.xml, and an output data directory named data/prepared. Everything has an MD5 hash, and as these files change the MD5 hash will change and a new instance of changed data files are stored in the DVC cache.

DVC files are checked into the SCM managed (Git) repository. As commits are made to the SCM repository each DVC file is updated (if appropriate) with new checksums of each file. Therefore with DVC one can recreate exactly the data set present for each commit, and the team can exactly recreate each development step of the project.

DVC files are roughly similar to the “pointer” files used in Git-LFS.

The DVC team recommends using different SCM tags or branches for each experiment. Therefore accessing the data files, and code, and configuration, appropriate to that experiment is as simple as switching branches. The SCM will update the code and configuration files, and DVC will update the data files, automatically.

This means there is no more scratching your head trying to remember which data files were used for what experiment. DVC tracks all that for you.

DVC remembers the exact sequence of commands used at what point of time

The DVC files remember not only the files used in a particular execution stage, but the command that is executed in that stage.

Reproducing a machine learning result requires not only using the precise same data files, but the same processing steps and the same code/configuration. Consider a typical step in creating a model, of preparing sample data to use in later steps. You might have a Python script, prepare.py, to perform that split, and you might have input data in an XML file named data/data.xml.

$ dvc run -d data/data.xml -d code/prepare.py \
            -o data/prepared \
            python code/prepare.py

This is how we use DVC to record that processing step. The DVC “run” command creates a DVC file based on the command-line options.

The -d option defines dependencies, and in this case we see an input file in XML format, and a Python script. The -o option records output files, in this case there is an output data directory listed. Finally, the executed command is a Python script. Hence, we have input data, code and configuration, and output data, all dutifully recorded in the resulting DVC file, which corresponds to the DVC file shown in the previous section.

If prepare.py is changed from one commit to the next, the SCM will automatically track the change. Likewise any change to data.xml results in a new instance in the DVC cache, which DVC will automatically track. The resulting data directory will also be tracked by DVC if they change.

A DVC file can also simply refer to a file, like so:

md5: 99775a801a1553aae41358eafc2759a9
- cache: true
  md5: ce68b98d82545628782c66192c96f2d2
  metric: false
  path: data/Posts.xml.zip
  persist: false
wdir: ..

This results from the “dvc add <em>file</em>” command, which is used when you simply have a data file, and it is not the result of another command. For example in https://dvc.org/doc/tutorial/define-ml-pipeline this is shown, which results in the immediately preceeding DVC file:

$ wget -P data https://dvc.org/s3/so/100K/Posts.xml.zip
$ dvc add data/Posts.xml.zip

The file Posts.xml.zip is then the data source for a sequence of steps shown in the tutorial that derive information from this data.

Take a step back and recognize these are individual steps in a larger workflow, or what DVC calls a pipeline. With “dvc add” and “dvc run” you can string together several Stages, each being created with a “dvc run” command, and each being described by a DVC file. For a complete working example, see https://github.com/iterative/example-get-started and https://dvc.org/doc/tutorial

This means that each working directory will have several DVC files, one for each stage in the pipeline used in that project. DVC scans the DVC files to build up a Directed Acyclic Graph (DAG) of the commands required to reproduce the output(s) of the pipeline. Each stage is like a mini-Makefile in that DVC executes the command only if the dependencies have changed. It is also different because DVC does not consider only the file-system timestamps, like Make does, but whether the file content has changed, as determined by the checksum in the DVC file versus the current state of the file.

Bottom line is that this means there is no more scratching your head trying to remember which version of what script was used for each experiment. DVC tracks all of that for you.

Image courtesy dvc.org

DVC makes it easy to share data and code between team members

A machine learning researcher is probably working with colleagues, and needs to share data and code and configuration. Or the researcher may need to deploy data to remote systems, for example to run software on a cloud computing system (AWS, GCP, etc), which often means uploading data to the corresponding cloud storage service (S3, GCP, etc).

The code and configuration side of a DVC workspace is stored in the SCM (like Git). Using normal SCM commands (like “git clone”) one can easily share it with colleagues. But how about sharing the data with colleagues?

DVC has the concept of remote storage. A DVC workspace can push data to, or pull data from, remote storage. The remote storage pool can exist on any of the cloud storage platforms (S3, GCP, etc) as well as an SSH server.

Therefore to share code, configuration and data with a colleague, you first define a remote storage pool. The configuration file holding remote storage definitions is tracked by the SCM. You next push the SCM repository to a shared server, which carries with it the DVC configuration file. When your colleague clones the repository, they can immediately pull the data from the remote cache.

This means your colleagues no longer have to scratch their head wondering how to run your code. They can easily replicate the exact steps, and the exact data, used to produce the results.

Image courtesy dvc.org


The key to repeatable results is using good practices, to keep proper versioning of not only their data but the code and configuration files, and to automate processing steps. Successful projects sometimes requires collaboration with colleagues, which is made easier through cloud storage systems. Some jobs require AI software running on cloud computing platforms, requiring data files to be stored on cloud storage platforms.

With DVC a machine learning research team can ensure their data, configuration and code are in sync with each other. It is an easy-to-use system which efficiently manages shared data repositories alongside an SCM system (like Git) to store the configuration and code.