1667978460
There once was a package called lime,
Whose models were simply sublime,
It gave explanations for their variations,
one observation at a time.
lime-rick by Mara Averick
This is an R port of the Python lime package (https://github.com/marcotcr/lime) developed by the authors of the lime (Local Interpretable Model-agnostic Explanations) approach for black-box model explanations. All credits for the invention of the approach goes to the original developers.
The purpose of lime
is to explain the predictions of black box classifiers. What this means is that for any given prediction and any given classifier it is able to determine a small set of features in the original data that has driven the outcome of the prediction. To learn more about the methodology of lime
read the paper and visit the repository of the original implementation.
The lime
package for R does not aim to be a line-by-line port of its Python counterpart. Instead it takes the ideas laid out in the original code and implements them in an API that is idiomatic to R.
Out of the box lime
supports a long range of models, e.g. those created with caret, parsnip, and mlr. Support for unsupported models are easy to achieve by adding a predict_model
and model_type
method for the given model.
The following shows how a random forest model is trained on the iris data set and how lime
is then used to explain a set of new observations:
library(caret)
library(lime)
# Split up the data set
iris_test <- iris[1:5, 1:4]
iris_train <- iris[-(1:5), 1:4]
iris_lab <- iris[[5]][-(1:5)]
# Create Random Forest model on iris data
model <- train(iris_train, iris_lab, method = 'rf')
# Create an explainer object
explainer <- lime(iris_train, model)
# Explain new observation
explanation <- explain(iris_test, explainer, n_labels = 1, n_features = 2)
# The output is provided in a consistent tabular format and includes the
# output from the model.
explanation
#> # A tibble: 10 × 13
#> model_type case label label_prob model_r2 model_intercept model_prediction
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 classificat… 1 seto… 1 0.695 0.118 0.991
#> 2 classificat… 1 seto… 1 0.695 0.118 0.991
#> 3 classificat… 2 seto… 1 0.680 0.123 0.974
#> 4 classificat… 2 seto… 1 0.680 0.123 0.974
#> 5 classificat… 3 seto… 1 0.668 0.134 0.972
#> 6 classificat… 3 seto… 1 0.668 0.134 0.972
#> 7 classificat… 4 seto… 1 0.668 0.132 0.980
#> 8 classificat… 4 seto… 1 0.668 0.132 0.980
#> 9 classificat… 5 seto… 1 0.691 0.125 0.980
#> 10 classificat… 5 seto… 1 0.691 0.125 0.980
#> # … with 6 more variables: feature <chr>, feature_value <dbl>,
#> # feature_weight <dbl>, feature_desc <chr>, data <list>, prediction <list>
# And can be visualised directly
plot_features(explanation)
lime
also supports explaining image and text models. For image explanations the relevant areas in an image can be highlighted:
explanation <- .load_image_example()
plot_image_explanation(explanation)
Here we see that the second most probably class is hardly true, but is due to the model picking up waxy areas of the produce and interpreting them as wax-light surface.
For text the explanation can be shown by highlighting the important words. It even includes a shiny
application for interactively exploring text models:
lime
is available on CRAN and can be installed using the standard approach:
install.packages('lime')
To get the development version, install from GitHub instead:
# install.packages('devtools')
devtools::install_github('thomasp85/lime')
Please note that the ‘lime’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Author: thomasp85
Source Code: https://github.com/thomasp85/lime
License: Unknown, MIT licenses found
1667978460
There once was a package called lime,
Whose models were simply sublime,
It gave explanations for their variations,
one observation at a time.
lime-rick by Mara Averick
This is an R port of the Python lime package (https://github.com/marcotcr/lime) developed by the authors of the lime (Local Interpretable Model-agnostic Explanations) approach for black-box model explanations. All credits for the invention of the approach goes to the original developers.
The purpose of lime
is to explain the predictions of black box classifiers. What this means is that for any given prediction and any given classifier it is able to determine a small set of features in the original data that has driven the outcome of the prediction. To learn more about the methodology of lime
read the paper and visit the repository of the original implementation.
The lime
package for R does not aim to be a line-by-line port of its Python counterpart. Instead it takes the ideas laid out in the original code and implements them in an API that is idiomatic to R.
Out of the box lime
supports a long range of models, e.g. those created with caret, parsnip, and mlr. Support for unsupported models are easy to achieve by adding a predict_model
and model_type
method for the given model.
The following shows how a random forest model is trained on the iris data set and how lime
is then used to explain a set of new observations:
library(caret)
library(lime)
# Split up the data set
iris_test <- iris[1:5, 1:4]
iris_train <- iris[-(1:5), 1:4]
iris_lab <- iris[[5]][-(1:5)]
# Create Random Forest model on iris data
model <- train(iris_train, iris_lab, method = 'rf')
# Create an explainer object
explainer <- lime(iris_train, model)
# Explain new observation
explanation <- explain(iris_test, explainer, n_labels = 1, n_features = 2)
# The output is provided in a consistent tabular format and includes the
# output from the model.
explanation
#> # A tibble: 10 × 13
#> model_type case label label_prob model_r2 model_intercept model_prediction
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 classificat… 1 seto… 1 0.695 0.118 0.991
#> 2 classificat… 1 seto… 1 0.695 0.118 0.991
#> 3 classificat… 2 seto… 1 0.680 0.123 0.974
#> 4 classificat… 2 seto… 1 0.680 0.123 0.974
#> 5 classificat… 3 seto… 1 0.668 0.134 0.972
#> 6 classificat… 3 seto… 1 0.668 0.134 0.972
#> 7 classificat… 4 seto… 1 0.668 0.132 0.980
#> 8 classificat… 4 seto… 1 0.668 0.132 0.980
#> 9 classificat… 5 seto… 1 0.691 0.125 0.980
#> 10 classificat… 5 seto… 1 0.691 0.125 0.980
#> # … with 6 more variables: feature <chr>, feature_value <dbl>,
#> # feature_weight <dbl>, feature_desc <chr>, data <list>, prediction <list>
# And can be visualised directly
plot_features(explanation)
lime
also supports explaining image and text models. For image explanations the relevant areas in an image can be highlighted:
explanation <- .load_image_example()
plot_image_explanation(explanation)
Here we see that the second most probably class is hardly true, but is due to the model picking up waxy areas of the produce and interpreting them as wax-light surface.
For text the explanation can be shown by highlighting the important words. It even includes a shiny
application for interactively exploring text models:
lime
is available on CRAN and can be installed using the standard approach:
install.packages('lime')
To get the development version, install from GitHub instead:
# install.packages('devtools')
devtools::install_github('thomasp85/lime')
Please note that the ‘lime’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Author: thomasp85
Source Code: https://github.com/thomasp85/lime
License: Unknown, MIT licenses found
1625465520
In this example i will show you localization - laravel localization example.
Laravel’s localization features provide a convenient way to retrieve text in different languages, allowing you to easily support multiple languages within your application. So here i will show you how to create localization or laravel dynamic language.
#localization - laravel localization example #localization tutorial #localization #laravel multi languag #laravel documentation #laravel localization
1596191340
**The trade-off between predictive power and interpretability **is a common issue to face when working with black-box models, especially in business environments where results have to be explained to non-technical audiences. Interpretability is crucial to being able to question, understand, and trust AI and ML systems. It also provides data scientists and engineers better means for debugging models and ensuring that they are working as intended.
This tutorial aims to present different techniques for approaching model interpretation in black-box models.
_Disclaimer: _this article seeks to introduce some useful techniques from the field of interpretable machine learning to the average data scientists and to motivate its adoption . Most of them have been summarized from this highly recommendable book from Christoph Molnar: Interpretable Machine Learning.
The entire code used in this article can be found in my GitHub
The dataset used for this article is the Adult Census Income from UCI Machine Learning Repository. The prediction task is to determine whether a person makes over $50K a year.
Since the focus of this article is not centered in the modelling phase of the ML pipeline, minimum feature engineering was performed in order to model the data with an XGBoost.
The performance metrics obtained for the model are the following:
Fig. 1: Receiving Operating Characteristic (ROC) curves for Train and Test sets.
Fig. 2: XGBoost performance metrics
The model’s performance seems to be pretty acceptable.
The techniques used to evaluate the global behavior of the model will be:
3.1 - Feature Importance (evaluated by the XGBoost model and by SHAP)
3.2 - Summary Plot (SHAP)
3.3 - Permutation Importance (ELI5)
3.4 - Partial Dependence Plot (PDPBox and SHAP)
3.5 - Global Surrogate Model (Decision Tree and Logistic Regression)
feat_importances = pd.Series(clf_xgb_df.feature_importances_, index=X_train.columns).sort_values(ascending=True)
feat_importances.tail(20).plot(kind='barh')
Fig. 3: XGBoost Feature Importance
When working with XGBoost, one must be careful when interpreting features importances, since the results might be misleading. This is because the model calculates several importance metrics, with different interpretations. It creates an importance matrix, which is a table with the first column including the names of all the features actually used in the boosted trees, and the other with the resulting ‘importance’ values calculated with different metrics (Gain, Cover, Frequence). A more thourough explanation of these can be found here.
The **Gain **is the most relevant attribute to interpret the relative importance (i.e. improvement in accuracy) of each feature.
In general, SHAP library is considered to be a model-agnostic tool for addressing interpretability (we will cover SHAP’s intuition in the Local Importance section). However, the library has a model-specific method for tree-based machine learning models such as decision trees, random forests and gradient boosted trees.
explainer = shap.TreeExplainer(clf_xgb_df)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, plot_type = 'bar')
Fig. 4: SHAP Feature Importance
The XGBoost feature importance was used to evaluate the relevance of the predictors in the model’s outputs for the Train dataset and the SHAP one to evaluate it for Test dataset, in order to assess if the most important features were similar in both approaches and sets.
It is observed that the most important variables of the model are maintained, although in different order of importance (age seems to take much more relevance in the test set by SHAP approach).
The SHAP Summary Plot is a very interesting plot to evaluate the features of the model, since it provides more information than the traditional Feature Importance:
shap.summary_plot(shap_values, X_test)
Fig. 5: SHAP Summary Plot
Another way to assess the global importance of the predictors is to randomly permute the order of the instances for each feature in the dataset and predict with the trained model. If by doing this disturbance in the order, the evaluation metric does not change substantially, then the feature is not so relevant. If instead the evaluation metric is affected, then the feature is considered important in the model. This process is done individually for each feature.
To evaluate the trained XGBoost model, the Area Under the Curve (AUC) of the ROC Curve will be used as the performance metric. Permutation Importance will be analyzed in both Train and Test:
# Train
perm = PermutationImportance(clf_xgb_df, scoring = 'roc_auc', random_state=1984).fit(X_train, y_train)
eli5.show_weights(perm, feature_names = X_train.columns.tolist())
# Test
perm = PermutationImportance(clf_xgb_df, scoring = 'roc_auc', random_state=1984).fit(X_test, y_test)
eli5.show_weights(perm, feature_names = X_test.columns.tolist())
Fig. 6: Permutation Importance for Train and Test sets.
Even though the order of the most important features changes, it looks like that the most relevant ones remain the same. It is interesting to note that, unlike the XGBoost Feature Importance, the age variable in the Train set has a fairly strong effect (as showed by SHAP Feature Importance in the Test set). Furthermore, the 6 most important variables according to the Permutation Importance are kept in Train and Test (the difference in order may be due to the distribution of each sample).
The coherence between the different approaches to approximate the global importance generates more confidence in the interpretation of the model’s output.
#model-interpretability #model-fairness #interpretability #machine-learning #shapley-values #deep learning
1598126700
In the supervised machine learning world, there are two types of algorithmic task often performed. One is called regression (predicting continuous values) and the other is called classification (predicting discrete values). Black box algorithms such as SVM, random forest, boosted trees, neural networks provide better prediction accuracy than conventional algorithms. The problem starts when we want to understand the impact (magnitude and direction) of different variables. In this article, I have presented an example of Random Forest binary classification algorithm and its interpretation at the global and local level using Local Interpretable Model-agnostic Explanations (LIME).
In this example, we are going to use the Pima Indian Diabetes 2 data set obtained from the UCI Repository of machine learning databases (Newman et al. 1998).
This data set is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the data set is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the data set. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
The Pima Indian Diabetes 2 data set is the refined version (all missing values were assigned as NA) of the Pima Indian diabetes data. The data set contains the following independent and dependent variables.
#data-science #python #machine-learning #model-explanation #lime
1616591309
The Blockchain App Factory offers a Local Bitcoin clone platform for its client with an impressive outcome that lures many users quickly. It allows the traders to buy and sell cryptocurrency for paying a particular party. This platform comes with peer-to-peer (P2P) with escrow for secure transactions, which helps in gaining trust and comfort with the feedback mechanism.
#local bitcoin clone script #buy & sell bitcoins with local currency #local bitcoin clone #best local bitcoin clone #local bitcoin exchange script #local bitcoin clone scripts