R-style Visualizations in Python

Use plotnine, a library similar to R’s ggplot, to create visualizations

Visualizations are a great way to quickly understand a new dataset. They make it easier to identify correlations between the various columns, as well as identify informative patterns in the data. There are several visualization libraries available for Python users such as matplotlib, seaborn, plotly, and graphiz. Since both R and Python are commonly used in data science and analytics, you may find yourself going between both languages. Maybe your organization is converting projects from R to Python, or you are an R user that has joined a team that works exclusively in Python. Or perhaps you have come across something done in R and simply wondered if it could be implemented in Python.

Ggplot is a commonly used library in R for data visualization. The Python equivalent is plotnine. This article will explore using plotnine for basic visualizations and will conclude with pros and cons of implementation. This exploration also assumes a basic knowledge of Python and its frequently used libraries such as pandas for data manipulation.

Understand and load the data

This exploration is based on the 2014 Uber dataset hosted on Kaggle. The four columns in the data are:

  • Date/Time : The date and time of the Uber pickup
  • Lat : The latitude of the Uber pickup
  • Lon : The longitude of the Uber pickup
  • Base : The TLC (Taxi & Limousine Commission) base company code affiliated with the Uber pickup

