If you’ve been following my articles, chances are you’ve already read one of my previous articles on Why and How to Use Pandas with Large Data.

Being a data scientist, Pandas is one of the best tools for data cleaning and analysis used in Python.

It’s _seriously _a game changer when it comes to cleaning, transforming, manipulating and analyzing data.

No doubt about it.

In fact, I’ve even created my own toolbox for data cleaning using Pandas. The toolbox is nothing but a compilation of common tricks to deal with messy data with Pandas.


My Love-Hate Relationship with Pandas

Don’t get me wrong.

Pandas is great. It’s powerful.

Image for postStack Overflow Traffic to Questions about Selected Python Packages

It’s still one of the most popular data science tools for data cleaning and analytics.

However, after being in data science field for some time, the data volume that I’m dealing with increases from 10MB, 10GB, 100GB, to 500GB or sometimes even more than that.

My PC either suffered** low performance or long runtime** due to the inefficient local memory usage for data that was larger than 100GB.

#big data & cloud #big data #dask #pandas #data analysis

Why and How to Use Dask with Big Data
1.10 GEEK