Random forest algorithm is one of my favorites. It can be used for both classification and regression. To put it in simpler language, a random forest collects predictions from various decision trees and gives out the average of those prediction. This way there is a chance that the predictions actually converge to the true value. Each decision tree is implemented only on a subset of data. This subset is randomly selected by the algorithm where in an observation is picked at random and replaced back into the dataset and another observation is chosen at random adding up to the subset of data; this is commonly known as Bootstrapping. Therefore, you can understand that single observation could be part of a decision tree multiple times since we replace the observation in the dataset and make a random selection. This process is repeated multiple times and for multiple decision trees. All these decision trees are collectively known as random forest and now, you exactly know why the words random and forest are used.
The basic idea here is to train each tree on different samples of data and use the average of thier predictions as the final output. This output has low variance and that is intuitive to understand.
I can strongly say that random forest is better than a single decision tree. Why? It is because the results are more robust. Every single decision tree brings in its own information and predicts accordingly. When we combine all such trees the result is expected to be more accurate and close to the true value on average.
#machine-learning #regression #random-forest #python
Baby Steps Towards Data Science: Random Forest Regression in Python.Understand the intuition behind random forest regression and implement it in python. Source code and dataset provided.