MLxtend library (Machine Learning extensions) has many interesting functions for everyday data analysis and machine learning tasks. Although there are many machine learning libraries available for Python such as scikit-learn, TensorFlow, Keras, PyTorch, etc, however, MLxtend offers additional functionalities and can be a valuable addition to your data science toolbox.
In this post, I will go over several tools of the library, in particular, I will cover:
A link to a free one-page summary of this post is available at the end of the article.
For a list of all functionalities this library offers, you can visit MLxtend’s documentation [1].
MLxtend library is developed by Sebastian Raschka (a professor of statistics at the University of Wisconsin-Madison). The library has nice API documentation as well as many examples.
You can install the MLxtend package through the Python Package Index (PyPi) by running pip install mlxtend
.
In this post, I’m using the wine data set obtained from the Kaggle. The data contains 13 attributes of alcohol for three types of wine. This is a multiclass classification dataset, and you can find the description of the dataset here.
First, let’s import the data and prepare the input variables X (feature set) and the output variable y (target).
For creating counterfactual records (in the context of machine learning), we need to modify the features of some records from the training set in order to change the model prediction [2]. This may be helpful in explaining the behavior of a trained model. The algorithm used in the library to create counterfactual records is developed by Wachter et al [3].
You can create counterfactual records using create_counterfactual() from the library. Note that this implementation works with any scikit-learn estimator that supports the predict()
function. Below is an example of creating a counterfactual record for an ML model. The counterfactual record is highlighted in a red dot within the classifier’s decision regions (we will go over how to draw decision regions of classifiers later in the post).
The code to create a counterfactual record in a classifier’s decision regions (Source code: author)
A counterfactual record is highlighted within a classifier’s decision region (Image by author)
An interesting and different way to look at PCA results is through a correlation circle that can be plotted using plot_pca_correlation_graph(). We basically compute the correlation between the original dataset columns and the PCs (principal components). Then, these correlations are plotted as vectors on a unit-circle. The axes of the circle are the selected dimensions (_a.k.a. _PCs). You can specify the PCs you’re interested in by passing them as a tuple to dimensions
function argument. The correlation circle axes labels show the percentage of the explained variance for the corresponding PC [1].
#data-analysis #machine-learning #towards-data-science #python