No matter how many thousands of large data sets you may be crunching with TensorFlow, or how much you use PyTorch to accelerate tensor computation with GPUs, at some point you’ll want to represent your results with cross-platform charts and figures. And for that, you’re almost certainly going to want to get to know Matplotlib, an essential Python 2D plotting library for data visualization. Though Matplotlib is beloved by data scientists, its roots are in physical science, oceanography, and climatology. Data science folks came later, borrowed the core libraries, and have applied them to more corporate uses.

Though today there is an ever-growing universe of Python-based data science tools and libraries, for years Matplotlib was the only way to make plots in Python; and it remains the default. At the heart of the Matplotlib development community is project lead Thomas Caswell, who found his way to leadership almost by accident as he went from answering Matplotlib questions on Stack Overflow, to submitting bug fixes to authoring patches.

In a recent web conference, Caswell walked me through his journey to Matplotlib leadership and why he contributes.

A Contribution Evolution

Caswell wasn’t the founder of Matplotlib — that honor goes to John Hunter, an epilepsy researcher at the University of Chicago Medical Center in the early 2000s. Hunter grew tired of fighting for access to the hardware key dongle that allowed him to use a proprietary software program for doing electrocorticography analysis. Hunter first tried to replace this program with MATLAB but found it unsuitable for his needs, so he set out to build what became Matplotlib.

During this time, Caswell had his own struggles with MATLAB related to memory management and was looking for options to further his academic work at the University of Chicago. As he dove into Python, he naturally ran into Matplotlib and, as mentioned, first contributed insight and eventually code. That code was made all the better under the tutelage of Mike Droettboom, who assumed project leadership after Hunter’s unfortunate passing in 2012. As Caswell remembers, “Droettboom taught me almost everything I know about programming.” Caswell worked closely with Droettboom and, over time, became Matplotlib’s lead maintainer.

Sponsor Note

sponsor logoAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 175 fully-featured services from data centers globally. Millions of customers are using AWS to lower costs, become more agile, and innovate faster.

How Caswell’s Matplotlib contributions evolved is worth noting, because this evolution is a useful guide for others who may want to start contributing to an open source project.

Caswell notes that answering Stack Overflow questions turns out to be an exceptional way to learn a library, because it puts you into a position to encounter others’ use cases. It was also an ideal way to start “fixing” bugs in the code without touching the code. Caswell says that eventually he was given commit rights so that he could apply pressure on the bug backlog in the other direction.

At the same time, Caswell’s experience surfaces another facet of community-driven open source projects: you can’t force it. Caswell says that over the past several years, Matplotlib’s development has been entirely volunteer-driven — by a combination of people from industry who do it either on their discretionary time at work or on nights and weekends, and a collection of professors and students. He says this makes for an interesting management problem, because you can’t tell anyone to do anything. There is “no coercion” in the community — just persuasion.

Add to this the interesting conflicts that arise when you have primarily text communication between people from different cultural backgrounds, he says, and managing an open source community ends up offering MBA-level experience to people who likely have zero interest in an MBA.

#data #development #open source #contributed #sponsored #data analysis

Open Source Builders: Why Data Scientists Love Matplotlib
1.20 GEEK