Building State Of Art Machine Learning Models With AutoGluon

AutoGluon and AutoML

AutoGluon is an open-source AutoML framework built by AWS, that enables easy to use and easy to extend AutoML. It enables you to achieve a state of art predictive accuracy by utilizing state of the art deep learning techniques without expertise. It is also a quick way to prototype what you can achieve from your dataset as well as get an initial baseline for your machine learning. AutoGluon currently supports working with tabular data, text prediction, image classification, and object detection.

AutoML frameworks exist to reduce the bar for getting started with machine learning. They take care of the heavy lifting tasks like data preprocessing, feature engineering, algorithm selection, and hyperparameter tuning. This means, given a dataset and a machine learning problem, keep training different models with different combinations of hyperparameters until you find the optimum combination of model and hyperparameters — also referred to as CASH (combined algorithm/hyperparameter tuning). Existing AutoML frameworks include SageMaker Autopilot, Auto-WEKA, and Auto-sklearn.

AutoGluon is different from other (traditional) AutoML frameworks it does more than CASH (combined algorithm/hyperparameter tuning).

Ensemble Machine Learning and Stacking

Before diving into AutoGluon, it is useful to revisit ensemble machine learning and stacking. Ensemble learning is a machine technique of training many (purposefully) weak models in parallel to solve the same problem. An ensemble consists of a set of individually trained classifiers, such as neural networks or decision trees, whose predictions are combined when classifying new instances. The basic idea behind this machine learning technique is that many models are better than few and models that learn differently can boost accuracy even if they perform worse in isolation.

In most cases, a single base algorithm is selected to build multiple models, whose results are then aggregated. This is also known as homogenous method of ensemble learning, like the random forest algorithm is one of the most common and popular homogenousensemble learning techniques where multiple trees are trained to predict the same problem, and then a majority vote is taken among them. Other examples of homogeneous methods include bagging, rotational forest, random subspace, etc.

In contrast, the heterogeneous methods involve using different machine learning base algorithms like decision trees, artificial neural networks, etc for creating the models that are used for ensemble learning. Stacking is a common heterogeneous ensemble learning technique.

This table here lists examples of homogeneous and heterogeneous machine learning.

AutoGluon uses a multi-layer stack ensemble and we will look into how that works next.

#machine-learning #ensemble-learning #aws #automl #autogluon

AutoGluon and AutoML

Ensemble Machine Learning and Stacking

towardsdatascience.com

Building State Of Art Machine Learning Models With AutoGluon