In the great talk “I Don’t Like Notebooks” (video and slides), Joel Grus lays out numerous criticisms of Jupyter Notebooks, perhaps the most popular environment for doing data science. I found the talk instructive — when everyone thinks something is great, you need people who are willing to criticize it so we don’t become complacent. However, I think the problem isn’t the notebook itself, but how it’s used: like any other tool, the Jupyter Notebook can be (and is) frequently abused.
Thus, I would like to amend Grus’ title and state “I Don’t Like Messy, Untitled, Out-of-Order Notebooks With No Explanations or Comments.” The Jupyter Notebook was designed for literate programming — mixing code, text, results, figures, and explanations together into one seamless document. From what I’ve seen, this notion is often completely ignored resulting in awful notebooks flooding repositories on GitHub:
Don’t let notebooks like this get onto GitHub.
The problems are clear:
The Jupyter Notebook can be an incredibly useful device for learning, teaching, exploration, and communication (here is a good example). However, notebooks like the above fail on all these counts and it’s nearly impossible to debug someone else’s work or even figure out what they are trying to do when these problems appear. At the very least, anyone should be able to name a notebook something helpful, write a brief introduction, explanation, and conclusion, run the cells in order, and make sure there are no errors before posting the notebook to GitHub.
Rather than just complaining about the problem (it’s easy to be a critic but a lot harder to do something positive) I decided to see what could be done with Jupyter Notebook extensions. The result is an extension that on opening a new notebook automatically:
The extension running when a new notebook is opened
The benefits of this extension are that it changes the defaults. By default, the Jupyter Notebook has no markdown cells, is unnamed, and has no imports. We know that humans are notoriously bad at changing default settings so why not make the defaults encourage better practices? Think of the Setup extension as a nudge — one that gently pushes you to write better notebooks.
To use this extension:
<a href="https://github.com/WillKoehrsen/Data-Analysis/tree/master/setup" target="_blank">setup</a>
folder (it has 3 files)pip show jupyter_contrib_nbextensions
to find where notebook extensions are installed. On my Windows machine (with anaconda) they are atC:\users\willk\anaconda3\lib\site-packages\jupyter_contrib_nbextensions
and on my mac (without anaconda) they are at:
/usr/local/lib/python3.6/site-packages/jupyter_contrib_nbextensions
setup
folder in nbextensions/
under the above path:Run jupyter contrib nbextensions install
to install the new extension
Run a Jupyter Notebook and enableSetup
on the nbextensions
tab (if you don’t see this tab, open a notebook and go to edit > nbextensions config)
Enable the Setup extension on the nbextensions tab
Now open a new notebook and you’re good to go! You can change the default template in main.js
(see my article on writing a Jupyter Notebook extension for more details on how to write your own). The default template and imports are relatively plain, but you can customize them to whatever you want.
Default template and imports
If you open an old notebook, you won’t get the default template, but you will be prompted to change the name from Untitled
every time you run a cell:
The Setup extension will continue prompting until the notebook name is changed from Untitled.
Sometimes, a little bit of persistence is what you need to change your ways.
From now on, let’s strive to create better notebooks. It doesn’t take much extra effort and it pays off greatly as others (and your future self) will be able to learn from your notebooks or use the results to make better decisions. Here are a few simple rules for writing effective notebooks:
The Setup extension will not solve all notebook-related problems, but hopefully, the small nudges will encourage you to adopt better habits. It takes a while to build up best practices, but, once you have them down, they tend to stick. With a little bit of extra effort, we can make sure that the next talk someone gives about notebooks is: “I like effective Jupyter Notebooks.”
#python #data-science