Machine Learning Model Explanation using Shapley Values

Article Outline

Why SHAP (SHapley Additive exPlanations)
About Dataset
Loading Dataset
Model Fitting
Shaply values estimation
Variable Importance plot
Summary plot
Dependence Plot
Force Plot
Tutorial DataSet

Why SHAP (SHapley Additive exPlanations)?

The very common problem with Machine Learning models is its interpretability. Majority of algorithms (tree-based specifically) provides the aggregate global feature importance but this lacks the interpretability as it does not indicate the direction of impact.

There are many methods available that was used for variable importance computation. The drop column method is one of the simplest technique to achieve this goal but it was computationally expensive as the number of models to train increases with the number of data features. Another approach was the **_permutation _**method where particular feature values are permuted to compute variability in model accuracy. The method has an advantage over the drop column method (few model training) but it fails when correlated features existed in the training dataset. For example, in medical data, if you use systolic and diastolic blood pressure (both are correlated) to train a model, in such scenario permutation method not able to distinguish the feature importance. To cope up with this problem more advanced methods were introduced. One of them was the SHAP (SHapley Additive exPlanations) proposed by Lundberg et al. [1], which is reliable, fast and computationally less expensive.

#variable-importance #shapley-values #machine-learning #model-explanation #interpretation

Why SHAP (SHapley Additive exPlanations)?

towardsdatascience.com

Machine Learning Model Explanation using Shapley Values