The word Ensemble refers to a group of objects and viewing them as a whole. The same definition applies even for Ensemble modeling in machine learning in which a group of models are considered together to make predictions. As soon as we hear about Ensemble modeling we remember one of the popular Ensemble models called Random Forests which is based on the Bagging Ensemble technique. In this article, we will not be discussing in-depth about the Random Forests instead we will be focussing on topics surrounding Ensembles and the popular different Ensemble techniques.

That’s enough of a small introduction. Let’s start looking at the topics involved in Ensembles. Rather than diving directly into different popular Ensemble techniques, we will first understand the criteria or the conditions needed to be satisfied by the individual models in order to form an Ensemble and upon satisfying the conditions we will discuss how well an Ensemble will perform better than any individual model.

Conditions or Criterias needs to be satisfied by individual models to form an Ensemble

An ensemble can be formed by using different kind of models which are performing the same classification or regression task meaning if performing a classification task we can form an Ensemble considering Logistic Regression model, Decision Tree Classifier, KNN Classifier, Support Vector Classifier, etc. The same goes for the regression task as well. But in order to form an Ensemble by using different or same individual models, the models need to satisfy the conditions which are **Diversity **and Acceptability.

The term **Diversity **means that the individual models which are considered must be complementary to each other meaning their strengthens and weaknesses must nullify each other. Speaking in terms of machine learning lingo, if a particular individual model overfits and another individual model performs well then the good performing model will nullify the overfit effect and as a whole, the Ensemble will be performing better. Upon having Diversity between the models there exists independent nature between predictions made by each of the models meaning the predictions made by one model will not be affected by the predictions made by another model and vice-versa. Also, the overall variance in the model reduces, as a result, the Ensemble will be immune to overfitting issues.

After hearing the entire story about Diversity, we might get a doubt at this point regarding how to achieve Diversity between the models. This can be achieved by performing some of the many practices which are:

  1. Considering the subsets of the training data and building different individual models on each of the subsets or bootstrap samples of the data.
  2. Building different individual models of the same model class by tuning with different hyperparameter combinations.
  3. Considering entirely distinct models to build meaning considering Logistic Regression, KNN, SVM, neural network to perform a classification task. The same follows with the regression task without any saying.
  4. By considering a different subset of features for building individual models.
  5. The last way would be by considering a different subset of data and a different subset of features to build individual models.

These are some of the many methods with which Diversity can be achieved between the models used to form Ensemble.

The term **Acceptability **means the individual models considered to form an Ensemble should be acceptable with each other to perform a task. In simple and statistical terms the probability of making correct predictions by the individual model should be better than any random model. The same statement can be quantified by saying as the probability of correct prediction by the individual model should be greater than 0.5.

#ensemble-learning #machine-learning #ai #ensemble-method

Ensemble Learning And Their Methods
1.35 GEEK