by Jason Brownlee on September 9, 2020 in Python Machine Learning
Tweet Share
Share
Automated Machine Learning (AutoML) refers to techniques for automatically discovering well-performing models for predictive modeling tasks with very little user involvement.
TPOT is an open-source library for performing AutoML in Python. It makes use of the popular Scikit-Learn machine learning library for data transforms and machine learning algorithms and uses a Genetic Programming stochastic global search procedure to efficiently discover a top-performing model pipeline for a given dataset.
In this tutorial, you will discover how to use TPOT for AutoML with Scikit-Learn machine learning algorithms in Python.
After completing this tutorial, you will know:
Let’s get started.
TPOT for Automated Machine Learning in Python
Photo by Gwen, some rights reserved.
This tutorial is divided into four parts; they are:
Tree-based Pipeline Optimization Tool, or TPOT for short, is a Python library for automated machine learning.
TPOT uses a tree-based structure to represent a model pipeline for a predictive modeling problem, including data preparation and modeling algorithms and model hyperparameters.
… an evolutionary algorithm called the Tree-based Pipeline Optimization Tool (TPOT) that automatically designs and optimizes machine learning pipelines.
— Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science, 2016.
An optimization procedure is then performed to find a tree structure that performs best for a given dataset. Specifically, a genetic programming algorithm, designed to perform a stochastic global optimization on programs represented as trees.
TPOT uses a version of genetic programming to automatically design and optimize a series of data transformations and machine learning models that attempt to maximize the classification accuracy for a given supervised learning data set.
— Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science, 2016.
The figure below taken from the TPOT paper shows the elements involved in the pipeline search, including data cleaning, feature selection, feature processing, feature construction, model selection, and hyperparameter optimization.
Overview of the TPOT Pipeline Search
Taken from: Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science, 2016.
Now that we are familiar with what TPOT is, let’s look at how we can install and use TPOT to find an effective model pipeline.
#python machine learning #python