Target Encoding For Multi-Class Classification

This article is in continuation of my previous article that explained how target encoding actually works. The article explained the encoding method on a binary classification task through theory and an example, and how category-encoders library gives incorrect results for multi-class target. This article shows when _TargetEncoder of category_encoders fails, gives a snip of the theory behind encoding multi-class target, _and provides the correct code, along with an example.

When does the TargetEncoder fail?

Look at this data. Color is a feature, and Target is well… target. Our aim is to encode Color based on Target.

Image for post

Let’s do the usual target encoding on this.

import category_encoders as ce

ce.TargetEncoder(smoothing=0).fit_transform(df.Color,df.Target)

Image for post

Hmm, that doesn’t look right, does it? All the colors were replaced with 1. Why? Because TargetEncoder takes mean of all the Target values for each color, instead of probability.

#multi-class #categorical-variable #categorical-data #target-encoding #multiclass-classification #big data

When does the TargetEncoder fail?

towardsdatascience.com

Target Encoding For Multi-Class Classification