Here’s how we used the hundreds of thousands of publicly accessible repos on GitHub to learn more about the current state of data science.

Inspired by research carried out 2 years ago by the Design Lab team at UC San Diego, the  JetBrains Datalore team decided to download all Jupyter notebooks accessible in October 2019 and October 2020 to gather statistics on the tools that the global DS community has been using in recent years.

Wordcloud

2 years ago there were 1,230,000 Jupyter Notebooks published on GitHub. By October 2020 this number had grown 8 times, and we were able to download 9,720,000 notebooks. We made this dataset publicly available, and you can find the instructions for accessing it at the bottom of the post. Feel free to play with it and share your insights with us by mentioning  @JBDatalore on Twitter, or write to us at contact@datalore.jetbrains.com.

All the statistics mentioned below were calculated using  this notebook in Datalore, which is an online Jupyter notebook with smart coding assistance hosted by JetBrains.

#datalore #insights #news #python #research

We Downloaded 10,000,000 Jupyter Notebooks From Github – This Is What We Learned
1.20 GEEK