In this digital era of smartphones and the internet, the fake news spreads like wildfire, it looks just like the real news and causes much damage to the community. So in this tutorial we are going to build a Fake News Classifier and deploy it on the cloud as a web app so that it can be accessed by anyone. It will not be as good as google’s or facebook’s fake news classifier but in accordance to the dataset obtained from Kaggle, it will be pretty decent.

_Before we get started, to get you motivated let me show you the web app you will be able to build by the end of this tutorial Fake News Classifier**. _**Now that you’ve seen the end product, lets get started.

Note: I am assuming that you are familiar with basic machine learning techniques, algorithms, and packages.

I’ve divided this tutorial into three parts:

  1. Exploratory Data Analysis
  2. Preprocessing and Model Training
  3. Building and Deploying Web App on Heroku

Now, if you are a beginner I’d recommend you install Anaconda distribution as it comes with all the necessary package for data science and set up a virtual environment.

If you want to follow along with this tutorial, here is the link to source code on my GitHub: https://github.com/eaofficial/fake-news-classifier.

You can obtain the dataset here or you can clone my GitHub repository.


1. Exploratory Data Analysis

Image for post

Create a file named eda.ipynb or eda.py in your project directory.

We will first import all the required packages.

#Importing all the libraries
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
import re
from wordcloud import WordCloud
import os

Now we will first read fake news dataset using pd.read_csv() and then we will explore the dataset.

In cell 4 of the above notebook, we count the number of sample fake news in each of the subject. We will also plot its distribution using seaborn count plot sns.coountplot() .

We will now plot a word cloud by first concatenating all the news in a single string then generating tokens and removing stopwords. Word cloud is a very good way to visualize the text data.

As you can see in the next cell now we will import true.csv as real news dataset and perform the same steps as we did on the fake.csv. One different thing you’ll notice in the real news dataset is that in the **_text _**column, there is a publication name like _WASHINGTON (Reuters) _separated by a hyphen(-).

It seems that the real news is credible as it comes from a publication house, so we will separate the publication from the news part to make the dataset uniform in the preprocessing part of this tutorial. For now, we’ll just explore the dataset.

If you are following along, you can see that the news subject column has non-uniform distribution in real and fake news dataset so, we will drop this column later. So that concludes our EDA.

Now we can get our hands dirty with something you guys have been waiting for. I know this part is frustrating but EDA and preprocessing is on of the most import in any Data Science lifecycle

#data-science #machine-learning #lstm #data analysis

Building and deploying end-to-end fake news classifier
1.50 GEEK