This article is in continuation of my previous article that explained how target encoding actually works. The article explained the encoding method on a binary classification task through theory and an example, and how category-encoders library gives incorrect results for multi-class target. This article shows when _TargetEncoder of category_encoders fails, gives a snip of the theory behind encoding multi-class target, _and provides the correct code, along with an example.
Look at this data. Color is a feature, and Target is well… target. Our aim is to encode Color based on Target.
Let’s do the usual target encoding on this.
import category_encoders as ce
ce.TargetEncoder(smoothing=0).fit_transform(df.Color,df.Target)
Hmm, that doesn’t look right, does it? All the colors were replaced with 1. Why? Because TargetEncoder takes mean of all the Target values for each color, instead of probability.
#multi-class #categorical-variable #categorical-data #target-encoding #multiclass-classification #big data