A short tutorial on Naive Bayes Classification with implementation

Naive Bayes classification is one of the most simple and popular algorithms in data mining or machine learning (Listed in the top 10 popular algorithms by CRC Press Reference [1]). The basic idea of the Naive Bayes classification is very simple.

(In case you think video format is more suitable for you, you can jump here you can also go to the notebook.)

The basic Intuition:

Let’s say, we have books of two categories. One category is Sports and the other is Machine Learning. I count the frequency of the words of “Match” (Attribute 1) and Count of the word “Algorithm” (Attribute 2). Let’s assume, I have a total of 6 books from each of these two categories and the count of words across the six books looks like the below figure.

Image for post

Figure 1: Count of words across the books

We see that clearly that the word ‘algorithm’ appears more in Machine Learning books and the word ‘match’ appears more in Sports. Powered with this knowledge, Let’s say if I have a book whose category is unknown. I know Attribute 1 has a value 2 and Attribute 2 has a value 10, we can say the book belongs to Sports Category.

Basically we want to find out which category is more likely, given attribute 1 and attribute 2 values.

#naive-bayes #scikit-learn #classification #deep learning

The basic Intuition:

towardsdatascience.com

A short tutorial on Naive Bayes Classification with implementation