Simplify Your Dataset Cleaning with Pandas

I’ve heard a lot of analysts/data scientists saying they spend most of their time cleaning data. You’ve probably seen a lot of tutorials to clean your dataset but you probably know that already: it will never be 100% clean and you have to understand that point before continuing to read this article. So I’ll be honest with you: I won’t give you the magic recipe to get rid of all the data issues you might have with your dataset. The cleaning rules depend on the domain you are working on and the context of your project. The examples of this article come from my own experience with data manipulation in the real world. I’ve dealt with all the issues/processes that I’m detailing in this article. The problem can come from the data source itself at times and you have to clean it, sometimes it’s just your colleague or your manager who requests some specific fields in the final file. Feel free to share the main issues you’ve seen from your experience in the comments.

