MLxtend: A Python Library with interesting tools for data science tasks

MLxtend library (Machine Learning extensions) has many interesting functions for everyday data analysis and machine learning tasks. Although there are many machine learning libraries available for Python such as scikit-learn, TensorFlow, Keras, PyTorch, etc, however, MLxtend offers additional functionalities and can be a valuable addition to your data science toolbox.

In this post, I will go over several tools of the library, in particular, I will cover:

Create counterfactual (for model interpretability)
PCA correlation circle
Bias-variance decomposition
Decision regions of classification models
Matrix of scatter plots
Bootstrapping

A link to a free one-page summary of this post is available at the end of the article.

For a list of all functionalities this library offers, you can visit MLxtend’s documentation [1].

MLxtend Library

MLxtend library is developed by Sebastian Raschka (a professor of statistics at the University of Wisconsin-Madison). The library has nice API documentation as well as many examples.

You can install the MLxtend package through the Python Package Index (PyPi) by running pip install mlxtend.

Dataset

In this post, I’m using the wine data set obtained from the Kaggle. The data contains 13 attributes of alcohol for three types of wine. This is a multiclass classification dataset, and you can find the description of the dataset here.

First, let’s import the data and prepare the input variables X (feature set) and the output variable y (target).

MLxtend Functionalities

Create Counterfactual (for model interpretability)

For creating counterfactual records (in the context of machine learning), we need to modify the features of some records from the training set in order to change the model prediction [2]. This may be helpful in explaining the behavior of a trained model. The algorithm used in the library to create counterfactual records is developed by Wachter et al [3].

You can create counterfactual records using create_counterfactual() from the library. Note that this implementation works with any scikit-learn estimator that supports the predict() function. Below is an example of creating a counterfactual record for an ML model. The counterfactual record is highlighted in a red dot within the classifier’s decision regions (we will go over how to draw decision regions of classifiers later in the post).

The code to create a counterfactual record in a classifier’s decision regions (Source code: author)

A counterfactual record is highlighted within a classifier’s decision region (Image by author)

PCA Correlation Circle

An interesting and different way to look at PCA results is through a correlation circle that can be plotted using plot_pca_correlation_graph(). We basically compute the correlation between the original dataset columns and the PCs (principal components). Then, these correlations are plotted as vectors on a unit-circle. The axes of the circle are the selected dimensions (_a.k.a. _PCs). You can specify the PCs you’re interested in by passing them as a tuple to dimensions function argument. The correlation circle axes labels show the percentage of the explained variance for the corresponding PC [1].

#data-analysis #machine-learning #towards-data-science #python