Data visualization is a crucial step in any exploratory data analysis or report. It’s usually easy to read, and it can give us insight into the dataset in one look. There are dozens of great tools for business intelligence, such as Tableau, Google Data Studio, and PowerBI that allow us to create graphs easily. A data analyst or data scientist will often create visualizations on a Jupyter Notebook using Python. Luckily, there are dozens of great Python libraries that create great graphs. However, the million-dollar question is: which one is the best?

Whether you are a student or a professional, you should know a few options out there. There are not perfect libraries. Thus, you should know the pros and cons of each data visualization. I will go over four of the most popular Python libraries for data visualization: Matplotlib, Seaborn, Plotly Express, and Altair. To do so, I will create a simple bar plot and analyze how easy it is to use each library. For this blog, I will use a city dataset. You can find the notebook here.

Categories

This blog will analyze how easy it is to set up a barplot, how easy it is to customize the graph to make it minimally presentable, and the library documentation.

Setting up the dataset

First, let’s import all the important libraries. It’s very likely that you already have Matplotlib and Seaborn installed on your computer. However, you might not have Plotly Express and Altair. You can easily install them using pip install plotly==4.14.3 and pip install altair vega_datasets.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import altair as alt
import plotly.express as px

Now we will import the dataset. For demonstration purposes, let’s only create a data frame with the top 15 more populated cities in the US. I will also fix the capitalization of the names of the cities. It will facilitate the editing process when we create the visualizations.

df = pd.read_csv('worldcitiespop.csv')

us = df[df['Country'] == 'us']
us['City'] = us['City'].str.title()
cities = us[['City', 'Population']].nlargest(15, ['Population'], keep=’first’)

Now we should be ready to analyze each library. Are you ready?

Category: Difficulty to Set Up and Initial Result

**Winner: **Plotly Express

**Losers: **Matplotlib, Altair, and Seaborn

In this category, all the libraries performed well. They are all easy to set up, and the results with basic editing are good enough for most of the analysis, but we need to have winners and losers, right?

Matplotlib is very easy to set up and remember the code. However, the chart doesn’t look good. It will probably do the job for data analysis, but its result for a business meeting is not great.

#data-analysis #python #data-visualization #data-science #libraries

Which Python Library is the Best One for Data Visualization?
1.25 GEEK