A guideline and checklist of what to do for Data Cleansing/Wrangling

It turns out that Data Scientists and Data Analysts will spend most of their time on data preprocessing and EDA rather than training a machine learning model. As one of the most important job, Data Cleansing is very important indeed.

We all know that we need to clean the data. I guess most people know that. But where to start? In this article, I will provide a generic guide/checklist. So, once we start on a new dataset, we can start the Data Cleansing as such.

Methodology (C-R-A-I)

Image for post

Photo by geralt on Pixabay

If we ask ourselves “why do we need to clean the data?”, I think it is obvious that it is because we want our data to follow some standards in order to be fed into some algorithm or visualised on a consistent scale. Therefore, let’s firstly summarise what are the “standards” that we want our data to have.

Here, I summarised 4 major criteria/standards that a cleansed dataset should have. I would call it “CRAI”.

#data-cleaning #data-analysis #data-science #data-analyst #data-cleansing #data science

Data Cleansing — Where to Start?
1.55 GEEK