The Random Forest algorithm is an ensemble of the Decision Trees algorithm. A Decision Tree model is generally trained using the Bagging Classifier. If you don’t want to use a bagging classifier algorithm to pass it through the Decision Tree Classification model, you can use a Random Forest algorithm as it is more convenient and better optimized for Decision Tree Classification. In this article, I will take you through the Random Forest algorithm in Machine Learning.
I will use all the CPU cores to train a RandomForestClassifier algorithm with 500 trees. But first, let’s start with importing the necessary libraries and data preparation to fit into a RandomForestClassifier algorithm:
import sys
assert sys.version_info >= (3, 5)
# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"
# Common imports
import numpy as np
import os
# to make this notebook's output stable across runs
np.random.seed(42)
# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)
Now, I will load the data, and split it into training and test sets
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
Now I will prepare the data using the bagging classifier and the decision tree classification model:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
bag_clf = BaggingClassifier(
DecisionTreeClassifier(splitter="random", max_leaf_nodes=16, random_state=42),
n_estimators=500, max_samples=1.0, bootstrap=True, random_state=42)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
With a very few parameters, a RandomForestClassifier uses all the hyperparameters of a Decision Tree Classification model and bagging classifier algorithm. Now let’s see how we can do this:
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, random_state=42)
rnd_clf.fit(X_train, y_train)
y_pred_rf = rnd_clf.predict(X_test)
np.sum(y_pred == y_pred_rf) / len(y_pred)
0.976
The RandomForestClassifier algorithm works by introducing extra randomness while producing decision trees. Instead of searching for the best features, it works by searching for the best features among the random sets of features.
#by aman kharwal #algorithms