Hello world! Hope you’re doing great today. In this article I would like to do a project related to Natural Language Processing (NLP). The project itself is not going to be very complicated as what we are gonna do is just a simple binary classification task.

So we know that Coronavirus is still around up until the time when I write this article. And thus, it’s obviously possible that there are also plenty of fake news related to that topic coming into the society. So the objective of this project is to create a machine learning model which is able to detect whether a news is fake or real.

Note: full code available in the end of this article.

Let’s start with some imports. I will explain them later on.

import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

Data collection & analysis

Before I go any further, I wanna inform you that the project that I’m going to explain here is inspired by the this article. The author of that article uses logistic regression to do the classification and obtain 93% of accuracy towards test data. On the other hand, here in my project I would like to employ Naïve Bayes classifier instead and see if I can obtain higher accuracy using this approach with the exact same dataset. You can download the COVID-19 news dataset from here.

#machine-learning #nlp #naive-bayes #data-science #ai

COVID-19 Fake News Detection using Naïve Bayes Classifier
2.85 GEEK