This is a small project in python using machine learning to detect whether a given text is spam or ham(not spam). I made this project after completing the course of “Applied Text Mining in Python” by the University of Michigan on Coursera, the link to the course is given at the end of the blog. Here, I’ll try to explain each step in my code and if you want the whole code, the GitHub link too will be available at the end.


I used my local machine for this project and the specification is mentioned below. Though its a very small project so you wouldn’t need a lot of computation power for it.

Specification

Name: Acer Predator Helios 300 (2019)

Graphics Card: NVIDIA GeForce GTX 1660 Ti

Processor Name: Intel Core i7–9750H

RAM: 16 GB


Importing The Libraries

I used Pandas and NumPy for data manipulation, matplotlib for graph plotting, and sklearn for preprocessing, model creation and model evaluation. I’ll explain what each library is doing when I use them in my code moving forward.

P.S. — %matplotlib notebook, is a jupyter notebook magic function.

Analysing The Data

Using the df.head(), df being the Pandas DataFrame object where I have loaded the data from the CSV, function to view the data, here we see that there are only two columns. text, containing the text for detection and target, as the label to tell whether the text is spam or not.

#projects #beginner #machine-learning #python #spam

Beginner Spam Detection
1.20 GEEK