AdaBoost, or Adaptive Boost, is a relatively new machine learning classification algorithm. It is an ensemble algorithm that combines many weak learners (decision trees) and turns it into one strong learner. Thus, its algorithm leverages **bagging **and **boosting **methods to develop an enhanced predictor.
If these words are confusing to you, don’t worry. In this article, we’ll go through a simple example to show how AdaBoost works and the math behind it.
AdaBoost is similar to Random Forests in the sense that the predictions are taken from many decision trees. However, there are three main differences that make AdaBoost unique:
Let’s look at an example now. Suppose we have the sample data below, with three features (x1, x2, x3) and an output (Y). Note that T = True and F = False.
Using the equation above, calculate the sample weight for each sample. For the first round, the sample weight will be equal. In this example, the sample weight for each sample will be equal to 1/6.
The next step is to calculate the Gini Impurity for each variable. This is done to determine which variable to use to create the first stump. The formula to calculate the Gini Impurity of each node is as follows:
Once you calculate the Gini Impurity of each node, the total Gini Impurity for each variable is the weighted average of the impurities of each node.
To show an example, let’s calculate the Gini Impurity of x2.
#data-science #mathematics #education #statistics #machine-learning #deep learning