Everybody loves a good data visualization. Still, they shouldn’t leave the interpretation to the viewer, as it’s the case with histograms. Today we’ll answer how binning bias can mislead you in the analysis and how to prevent this issue with the power of ECDF plots.
The article answers the following questions:
Without much ado, let’s get started!
As Justin Bois from DataCamp said — binning bias — and I can’t agree more. What this means is that using different bin sizes on a histogram makes data distribution look different. Don’t take my word for it — the example below speaks for itself.
To start, we’ll import a couple of libraries for data analysis and visualization, and load the Titanic dataset straight from the web:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')
#python #data-science #towards-data-science #machine-learning #artificial-intelligence