The Naive Bayes algorithm is a classification technique based on Bayes Theorem. It assumes that a feature in a class is unrelated to the presence of any other feature. The algorithm relies on the posterior probability of the class given a predictor, as shown in the following formula:
where:
The Naive Bayes classifier is easy to implement and performs well, even with a small training data set. It is one of the best fast solutions when it comes to predicting the class of the data. Scikit-learn offers different algorithms for various types of problems. One of them is the Gaussian Naive Bayes. It is used when the features are continuous variables, and it assumes that the features follow a Gaussian distribution. It is straightforward to apply the open-source model on data, but a good analyst has to understand how the model is built so that he can use it to the appropriate data.
The best way to understand a model is to build one from scratch. All the following methods are defined in a GaussianNBClassifier
class. Let’s have some fun!
We will use only the numpy
library for arithmetical operations.
import numpy as np
class GaussianNBClassifier:
def __init__(self):
pass
According to the Bayes Theorem, we need to know the class prior probability. To calculate it, we have to assign the feature values to the specific class. We can do this by separating the classes and saving them into a dictionary.
def separate_classes(self, X, y):
separated_classes = {}
for i in range(len(X)):
feature_values = X[i]
class_name = y[i]
if class_name not in separated_classes:
separated_classes[class_name] = []
separated_classes[class_name].append(feature_values)
return separated_classes
#python #gaussian-distribution #classification #algorithms #naive-bayes