The performance of a machine learning model not only depends on the model and the hyperparameters but also on how we process and feed different types of variables to the model. Since most machine learning models only accept numerical variables, preprocessing the categorical variables becomes a necessary step. We need to convert these categorical variables to numbers such that the model is able to understand and extract valuable information.
A typical data scientist spends 70–80% of his time cleaning and preparing the data. And converting categorical data is an unavoidable activity. It not only elevates the model quality but also helps in better feature engineering. Now the question is, how do we proceed? Which categorical data encoding method should we use?
In this article, I will be explaining various types of categorical data encoding methods with implementation in Python.
_In case you want to learn concepts of data science in video format, check out our course- _Introduction to Data Science
#data-encoding #data-science #python #categorical-data #machine-learning #data analysis