In this post, we will get the idea of Shapley value, why the order of features matter, how to move from Shapley value to SHAP, the story of Observational and Interventional Conditional Distribution when filling absent features, should we use train set or test set for explaining the model and so forth.

SHAP is based on Shapley value, so we need to know what is the Shapley value first.

Let’s say we have 3 players namely L, M, N going for a basketball game with machines. If L plays alone, he can earn 10 points. For M’s and N’s, the numbers are 20 and 25 points, respectively. If L and M play together, somehow they know how to collaborate and end up with 40 points. However, when L and N team up, they get only 30 points. The details are shown in the following table, where *v(S)* is the total contribution that members in *S* can get by cooperation.

Contribution of each coalition.

*How can we find the contribution of each player in the team with the information above? Which player is the best among the three?*

