**Random Forest Regression** is a supervised learning algorithm that uses **ensemble learning** method for regression. Ensemble learning method is a technique that combines predictions from multiple machine learning algorithms to make a more accurate prediction than a single model.

The diagram above shows the structure of a Random Forest. You can notice that the trees run in parallel with no interaction amongst them. A Random Forest operates by constructing several decision trees during training time and outputting the mean of the classes as the prediction of all the trees. To get a better understanding of the Random Forest algorithm, let’s walk through the steps:

- Pick at random
*k*data points from the training set. - Build a decision tree associated to these _k _data points.
- Choose the number _N _of trees you want to build and repeat steps 1 and 2.
- For a new data point, make each one of your
*N*-tree trees predict the value of*y*for the data point in question and assign the new data point to the average across all of the predicted _y _values.

A Random Forest Regression model is powerful and accurate. It usually performs great on many problems, including features with non-linear relationships. Disadvantages, however, include the following: there is no interpretability, overfitting may easily occur, we must choose the number of trees to include in the model.

Now that we have a basic understanding of how the Random Forest Regression model works, we can assess its performance on a real-world dataset. Similar to my previous posts, I will be using data on House Sales in King County, USA.

After importing the libraries, importing the dataset, addressing null values, and dropping any necessary columns, we are ready to create our Random Forest Regression model!

**_Step 1: Identify your dependent (y) and independent variables (_X)**

Our dependent variable will be *prices* while our independent variables are the remaining columns left in the dataset.

#random-forest-regressor #machine-learning #model #trees #data-science #data analysis

15.25 GEEK