Background

Recently I did a project wherein the target was multi-class. It was a simple prediction task and the dataset involved both categorical as well as numerical features.

For those of you who are wondering what multi-class classification is: If you want to answer in ‘0 vs 1’, ‘clicked vs not-clicked’ or ‘cat vs dog’, your classification problem is binary; if you want to answer in ‘red vs green vs blue vs yellow’ or ‘sedan vs hatch vs SUV’, then the problem is multi-class.

Therefore, I was researching suitable ways to encode the categorical features. No points for guessing, I was taken to medium articles enumerating benefits of mean target encoding and how it outperforms other methods and how you can use category_encoders library to do the task in just 2 lines of code. However, to my surprise, I found that no article demonstrated this on multi-class target. I went to the documentation of category_encoders and found that it does not say anything about supporting multi-class targets. I dug deeper, scouring through the source code and realized that the library only works for binary or continuous targets.

So I thought: “Inside of every problem lies an opportunity.” — Robert Kiposaki

Going deep, I went straight for the original paper by _Daniele Micci-Barreca _that introduced mean target encoding. Not only for regression problem, the paper gives the solution for both binary classification as well as multi-class classification. This is the same paper that category_encoders cites for target encoding as well.

While there are several articles explaining target encoding for regression and binary classification problems, my aim is to implement target encoding for multi-class variables. However, before that, we need to understand how it’s done for binary targets. In this article, I cover an overview of the paper that introduced target encoding, and show by example how target encoding works for binary problems.

#multiclass-classification #target-encoding #categorical-data #binary-classification #multi-class #data analytic

All About Target Encoding For Classification Tasks
2.50 GEEK