When dealing with data in machine learning, we could meet various types of data, whether it’s a string, number or date. String data must be pre-processed into numerical version so they could be trained. Sometimes for some string data, we can directly identify them as categorical data.
What is categorical data? It’s just simply data that are represented in groups or label. For example, in country variable, there are ‘Germany’, ‘South Africa’, or ‘Peru’. Or ‘Male’ vs ‘Female’ in human cases.
Categorical data has two types: nominal and ordinal. Nominal variable is like the examples given previously, like countries, or gender, where there are no order assigned for each type. For ordinal, each value represents a specific level. For example, educational level has levels such as primary school, junior high school, high school, bachelor degree, master degree and so on.

#categorical-data #machine-learning #data-preprocessing

Basic Encoding for Categorical Data in Machine Learning
1.40 GEEK