It’s hard to know what to do with null values in our data. Most times, it’s easier to drop them and follow with what’s left.
But they can eventually have meaning and can be investigated, taking some time to have a good look at them can often bring a better understanding of how the data was collected and even reveal some patterns in it.
Null values matrix
In this article, we’ll explore how to visualize all the NULLs in our datasets and get a look at what insights we can extract from doing so.
The dataset will be California Jail Profile Survey, which contains monthly county-level data from 1995 to 2018.
import pandas as pd f = 'data/california_jail_county_monthly_1995_2018.csv' df = pd.read_csv(f)
After loading the dataset to Pandas, we can have a look at one of its convenient methods for dealing with Nulls.
We can use .isnull followed by a .sum and get the number of missing values.
Null values count by column
That’s already useful since it gives us an idea of which fields we can rely on, but there are better ways of visualizing this, let’s try using Missingno.
Missingno is a library for visualizing incompleteness in a dataset, it works on top of Matplotlib and Seaborn, and it’s effortless to use.
import missingno as msno
We’ll start with a simple bar chart, instead of comparing that big list of numbers we’ll use rectangles and their sizes.
#data-visualization #null #data-science #data-analysis #python