MLxtend: A Python Library with interesting tools for data science tasks

MLxtend library (Machine Learning extensions) has many interesting functions for everyday data analysis and machine learning tasks. Although there are many machine learning libraries available for Python such as scikit-learnTensorFlowKerasPyTorchetc, however, MLxtend offers additional functionalities and can be a valuable addition to your data science toolbox.

In this post, I will go over several tools of the library, in particular, I will cover:

  • Create counterfactual (for model interpretability)
  • PCA correlation circle
  • Bias-variance decomposition
  • Decision regions of classification models
  • Matrix of scatter plots
  • Bootstrapping

A link to a free one-page summary of this post is available at the end of the article.

For a list of all functionalities this library offers, you can visit MLxtend’s documentation [1].

MLxtend Library

MLxtend library is developed by Sebastian Raschka (a professor of statistics at the University of Wisconsin-Madison). The library has nice API documentation as well as many examples.

You can install the MLxtend package through the Python Package Index (PyPi) by running pip install mlxtend.


In this post, I’m using the wine data set obtained from the Kaggle. The data contains 13 attributes of alcohol for three types of wine. This is a multiclass classification dataset, and you can find the description of the dataset here.

First, let’s import the data and prepare the input variables X (feature set) and the output variable y (target).

MLxtend Functionalities

Create Counterfactual (for model interpretability)

For creating counterfactual records (in the context of machine learning), we need to modify the features of some records from the training set in order to change the model prediction [2]. This may be helpful in explaining the behavior of a trained model. The algorithm used in the library to create counterfactual records is developed by Wachter et al [3].

You can create counterfactual records using create_counterfactual() from the library. Note that this implementation works with any scikit-learn estimator that supports the predict() function. Below is an example of creating a counterfactual record for an ML model. The counterfactual record is highlighted in a red dot within the classifier’s decision regions (we will go over how to draw decision regions of classifiers later in the post).

The code to create a counterfactual record in a classifier’s decision regions (Source code: author)

A counterfactual record is highlighted within a classifier’s decision region (Image by author)

PCA Correlation Circle

An interesting and different way to look at PCA results is through a correlation circle that can be plotted using plot_pca_correlation_graph(). We basically compute the correlation between the original dataset columns and the PCs (principal components). Then, these correlations are plotted as vectors on a unit-circle. The axes of the circle are the selected dimensions (_a.k.a. _PCs). You can specify the PCs you’re interested in by passing them as a tuple to dimensions function argument. The correlation circle axes labels show the percentage of the explained variance for the corresponding PC [1].

#data-analysis #machine-learning #towards-data-science #python

What is GEEK

Buddha Community

MLxtend: A Python Library with interesting tools for data science tasks
akshay L

akshay L


Data Science With Python Training | Python Data Science Course | Intellipaat

In this Data Science With Python Training video, you will learn everything about data science and python from basic to advance level. This python data science course video will help you learn various python concepts, AI, and lots of projects, hands-on demo, and lastly top trending data science and python interview questions. This is a must-watch video for everyone who wishes o learn data science and python to make a career in it.

#data science with python #python data science course #python data science #data science with python

Uriah  Dietrich

Uriah Dietrich


How To Build A Data Science Career In 2021

For this week’s data science career interview, we got in touch with Dr Suman Sanyal, Associate Professor of Computer Science and Engineering at NIIT University. In this interview, Dr Sanyal shares his insights on how universities can contribute to this highly promising sector and what aspirants can do to build a successful data science career.

With industry-linkage, technology and research-driven seamless education, NIIT University has been recognised for addressing the growing demand for data science experts worldwide with its industry-ready courses. The university has recently introduced B.Tech in Data Science course, which aims to deploy data sets models to solve real-world problems. The programme provides industry-academic synergy for the students to establish careers in data science, artificial intelligence and machine learning.

“Students with skills that are aligned to new-age technology will be of huge value. The industry today wants young, ambitious students who have the know-how on how to get things done,” Sanyal said.

#careers # #data science aspirant #data science career #data science career intervie #data science education #data science education marke #data science jobs #niit university data science

HI Python

HI Python


Must-Know Data Science Libraries in Python

Python is the most widespread and popular programming language in data science, software development, and related fields. The simplicity of codes in Python, which helps learners avoid any confusion, is the key to this popularity. Python has constantly been developing, and it keeps getting updated for more ease in using. With 137,000 plus libraries and tools, Python has always provided its users with the solutions to problems of any complexity level. This reason makes Python the ideal language for Data Science operations. This article focuses on some of the essential and must-learn libraries in Python used heavily by Data Scientists. I have tried to cover different libraries used in various stages of a data science cycle, such as Data Mining, processing and modeling, Data Visualization.

Learn Data Science in Python from here!

#data-visualization #data #data-science #python-programming #python #must-know data science libraries in python

Data Science with Python Certification Training in Chennai

Learn Best data science with python Course in Chennai by Industry Experts & Rated as and Best data science with python training in Chennai. Call Us Today!

#data science with python training #data science with python courses #data science with python #data science with python course

Top 21 Data Mining Tools

Data mining is a world itself, which is why it can easily get very confusing. There is an incredible number of data mining tools available in the market. However, while some might be more suitable for handling data mining in Big Data, others stand out for their data visualization features.

As is explained in this article, data mining is about discovering patterns in data and predicting trends and behaviors. Simply put, it is the process of converting vasts sets of data into relevant information. There is not much use in having massive amounts of data if we do not actually know what it means.

This process encompasses other fields such as machine learning, database systems, and statistics. Additionally, data mining functions can vary greatly from data cleansing to artificial intelligence, data analytics, regression, clustering, etc. Consequently, many tools are being developed and updated to fulfill these functions and ensure the quality of large data sets (since poor data quality results in poor and irrelevant insights). This article seeks to explain the best options for each function and context. Keep reading to find out our 21 top mining tools!

#data science #data #data mining #python data science #data mining tools #r for data science