As a long time user and participant in open source communities, I’ve always known that documentation is far from being a solved problem. At least, that’s the impression we get from many developers: “writing docs is boring”; “it’s a chore, nobody likes to do it”. I have come to realize I’m one of those rare people who likes to write both code and documentation.

Nobody will argue against documentation. It is clear that for an open-source software project, documentation is the public face of the project. The docs influence how people interact with the software and with the community. It sets the tone about inclusiveness, how people communicate and what users and contributors can do. Looking at the results of a “NumPy Tutorial” search on any search engine also gives an idea of the demand for this kind of content - it is possible to find documentation about how to read the NumPy documentation!

I’ve started working at Quansight in January, and I have started doing work related to the NumPy CZI Grant. As a former professor in mathematics, this seemed like an interesting project both because of its potential impact on the NumPy (and larger) community and because of its relevance to me, as I love writing educational material and documentation. Having official high-level documentation written using up-to-date content and techniques will certainly mean more users (and developers/contributors) are involved in the NumPy community.

So, if everybody agrees on its importance, why is it so hard to write good documentation?

Why do we lack documentation?

In a recent article about documentation for open source data analytics libraries, Geiger et al. point out that “In a 2017 GitHub survey of OSS contributors, 93% reported that incomplete or outdated documentation is a pervasive problem but 60% of contributors say they rarely or never contribute to documentation (Zlotnick et al. 2017).” At the same time, still in the words of Geiger et al., “Many interviewees who regularly contribute documentation to such projects stated that they did not feel like they received same levels of positive community feedback for documentation work as they did for adding new features or fixing bugs.” The authors in this paper go as far as saying that documentation is the invisible work behind these projects.

It doesn’t help that many of these projects have been around for some time and have developed in decentralized, community-based ways. This is both positive and negative since it leaves the choice of the type of contribution to the new contributor. Most of the time, people who arrive want to make a big impact, and they perceive implementing a new feature or solving a pending bug, not writing documentation, as the way to do that.

Coming from an academic background, the same kind of dynamics seems to apply. Writing a paper or doing an experiment is far more appealing than writing a textbook or developing high-quality educational materials. But educational materials can have a huge impact and effectively bring people into the community. For me, realizing this was a turning point on the way I looked at documentation.

#labs #numpy

Documentation as a way to build Community
1.20 GEEK