If you like to cook you know this very well. Turning on the stove and cooking food is a tiny part of the whole cooking process. Much of your sweat and tears actually go into preparing the right ingredients.

Cliché, but worth saying it again — data preparation is 80% of work in any data science project. Whether it is about making a dashboard, a simple statistical analysis, or fitting advanced machine learning model — it all starts with finding the data and transforming it into the right format so the algorithm can take care of the rest.

If you are a Python fan, then pandas is your best friend in your data science journey. Equipped with all the tools, it helps you get through the most difficult parts of a project.

That said, like any new tool you first need to learn it’s functionalities and how to put them into use. Many beginners in data science still struggle to make the best use of Pandas and instead spend much of their time on Stack Overflow. The principal reason for this is, I’d say, not being able to match Pandas functionalities with their analytics needs.

Much of this struggle can be overcome simply by making an inventory of typical data preparation problems and matching them with appropriate Pandas tools. Below I am presenting a typical data preparation and exploratory analysis workflow and matching with necessary Pandas functions. I am not trying to document everything under the sun on Pandas rather demonstrating the process of creating your own data wrangling cheatsheet.

#data science #data preparation #data wrangling #pandas #python

Essential commands for data preparation with Pandas
1.25 GEEK