How to version control Jupyter Notebooks

How to version control Jupyter Notebooks

Let’s look at all the tools you can leverage to make notebooks play nicely with modern version control systems like git. Jupyter notebooks are fantastic in many ways but collaboration is not so easy with them. In this article we’ll look at all the tools you can leverage to make notebooks play nicely with modern version control systems like git

Jupyter notebooks are fantastic in many ways but collaboration is not so easy with them. In this article we’ll look at all the tools you can leverage to make notebooks play nicely with modern version control systems like git!

Why is Jupyter version control so hard?

The software world has converged on git as it’s version control tool of choice. Git is designed to work primarily for human-readable text files. Whereas Jupyter is a rich JSON document with source code, markdown, HTML, images all rolled into a single .ipynb file.

Git doesn’t handle rich documents like notebooks very well. E.g. git merge for long nested JSON documents is humanly impossible, git diff for binary image string is horrible (shown below).

Image for post

What’s required from notebook version control?

Here’s what we need from a modern version control system -

  • Ability to create checkpoints / commits
  • Quickly checkout any of the past notebook versions
  • See what changed from one version to another (a.k.a visual diff for notebooks)
  • Multiple people can work on a single notebook with easy merge conflict resolution
  • Ability to provide feedback & ask questions about a specific notebook cell

That’s our wishlist! This blogpost is going to introduce you to all the important tools that can help you achieve these.

Disclaimer: I’m the author of two of the tools listed below (ReviewNB & GitPlus) but this is an unbiased review of all the useful tools in this space.

nbdime

nbdime is an open source library for diffing and merging notebooks locally. You can set this up to work with local git client so that git diff & git merge commands use nbdime for .ipynb files. With nbdime you can —

  • Run git diff to see how notebook has changed before committing
  • Easily merge remote changes with your locally edited notebook

JupyterLab Git Extensions

Following JupyterLab extensions are useful for notebook version control. You can install these on your local JupyterLab.

  • jupyterlab-git can be used to browse GitHub repositories, look at visual diffs of changed files, and push your commits
  • GitPlus can be used to push commits and create pull requests on GitHub directly from JupyterLab UI

data-science jupyter-notebook version-control git github

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

How to use Git / GitHub with Jupyter Notebook

Git-101 for Jupyter notebook users!This is a Git-101 for Jupyter users that are not familiar with Git / GitHub. It’s a hands on tutorial & is meant to be comprehensive. Feel free to skip a section if you are already familiar with the steps. At the end you’ll be able to, Push your notebooks to a GitHub repository Start versioning your notebooks + learn how to revert to a specific notebook version

Why Jupyter Notebooks are the Future of Data Science

Why Jupyter Notebooks are the Future of Data Science. How Jupyter Notebooks played an important role in the incredible rise in popularity of Data Science and why they are its future.

What Are The Advantages and Disadvantages of Data Science?

Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

Introducing GitPlus: Version Control extension for Jupyter

JupyterLab extension to push commits & create pull requests on GitHub. There’s no easy way to version control notebooks from Jupyter UI. Of course you can drop down to command line & learn a bunch of git commands to version control your notebooks. But not everyone using Jupyter is proficient at git. Hence I built GitPlus, a JupyterLab extension that provides the ability to commit notebooks & create GitHub pull requests directly from JupyterLab UI.