Motivation

“What the hell is this?”

_That is the reaction beginners often get when they look at the output of some hard _pandas function. It always baffles me how lots of tutorials and courses introduce such functions in a single sentence and move on to other topics in a heartbeat. Even if you read their documentation from top to bottom, it is gonna be more than a minute, guaranteed.

As a beginner, I was always so frustrated when I was in such situations. One time, while doing a nano degree at Udacity, I was learning about Matplotlib’s heatmaps. If you know, Matplotlib cannot create annotations for heatmaps automatically like Seaborn, so you had to create them by hand. To do that you would useqcut or cut (which are the topic of this article) to bin your data into categories and I was completely new to these functions.

The instructor, in a single sentence, briefly ‘explained’ the functions and a link to the documentation appeared on the screen🤦‍♂️. He even fast-forwarded through the parts where he was typing out the syntax.

Lately, I have been writing a little series explaining the hardest functions of pandas in the hopes that other people won’t face the same difficulties. This is part four and it is about binning numeric data using qcut and cut.

Clickable Table of Contents

∘ Motivation

_∘ _Setup and basic exploration

_∘ _Interval Notation Refresher

_∘ _Pandas qcut()

_∘ _Pandas cut()

The notebook used in the article can be downloaded from this GitHub repo.

#data-science #programming #technology #machine-learning

How I customarily bin data with Pandas
1.55 GEEK