Lenora  Hauck

Lenora Hauck

1598494320

Beginner Spam Detection

This is a small project in python using machine learning to detect whether a given text is spam or ham(not spam). I made this project after completing the course of “Applied Text Mining in Python” by the University of Michigan on Coursera, the link to the course is given at the end of the blog. Here, I’ll try to explain each step in my code and if you want the whole code, the GitHub link too will be available at the end.


I used my local machine for this project and the specification is mentioned below. Though its a very small project so you wouldn’t need a lot of computation power for it.

Specification

Name: Acer Predator Helios 300 (2019)

Graphics Card: NVIDIA GeForce GTX 1660 Ti

Processor Name: Intel Core i7–9750H

RAM: 16 GB


Importing The Libraries

I used Pandas and NumPy for data manipulation, matplotlib for graph plotting, and sklearn for preprocessing, model creation and model evaluation. I’ll explain what each library is doing when I use them in my code moving forward.

P.S. — %matplotlib notebook, is a jupyter notebook magic function.

Analysing The Data

Using the df.head(), df being the Pandas DataFrame object where I have loaded the data from the CSV, function to view the data, here we see that there are only two columns. text, containing the text for detection and target, as the label to tell whether the text is spam or not.

#projects #beginner #machine-learning #python #spam

What is GEEK

Buddha Community

Beginner Spam Detection
Lenora  Hauck

Lenora Hauck

1598494320

Beginner Spam Detection

This is a small project in python using machine learning to detect whether a given text is spam or ham(not spam). I made this project after completing the course of “Applied Text Mining in Python” by the University of Michigan on Coursera, the link to the course is given at the end of the blog. Here, I’ll try to explain each step in my code and if you want the whole code, the GitHub link too will be available at the end.


I used my local machine for this project and the specification is mentioned below. Though its a very small project so you wouldn’t need a lot of computation power for it.

Specification

Name: Acer Predator Helios 300 (2019)

Graphics Card: NVIDIA GeForce GTX 1660 Ti

Processor Name: Intel Core i7–9750H

RAM: 16 GB


Importing The Libraries

I used Pandas and NumPy for data manipulation, matplotlib for graph plotting, and sklearn for preprocessing, model creation and model evaluation. I’ll explain what each library is doing when I use them in my code moving forward.

P.S. — %matplotlib notebook, is a jupyter notebook magic function.

Analysing The Data

Using the df.head(), df being the Pandas DataFrame object where I have loaded the data from the CSV, function to view the data, here we see that there are only two columns. text, containing the text for detection and target, as the label to tell whether the text is spam or not.

#projects #beginner #machine-learning #python #spam

Chando Dhar

Chando Dhar

1619799996

Deep Learning Project : Real Time Object Detection in Python & Opencv

Real Time Object Detection in Python And OpenCV

Github Link: https://github.com/Chando0185/Object_Detection

Blog Link: https://knowledgedoctor37.blogspot.com/#

I’m on Instagram as @knowledge_doctor.

Follow Me On Instagram :
https://www.instagram.com/invites/contact/?i=f9n3ongbu8ma&utm_content=jresydt

Like My Facebook Page:

https://www.facebook.com/Knowledge-Doctor-Programming-114082097010409/

#python project #object detection #python opencv #opencv object detection #object detection in python #python opencv for object detection

Wanda  Huel

Wanda Huel

1601280960

Statistical techniques for anomaly detection

Anomaly and fraud detection is a multi-billion-dollar industry. According to a Nilson Report, the amount of global credit card fraud alone was USD 7.6 billion in 2010. In the UK fraudulent credit card transaction losses were estimated at more than USD 1 billion in 2018. To counter these kinds of financial losses a huge amount of resources are employed to identify frauds and anomalies in every single industry.

In data science, “Outlier”, “Anomaly” and “Fraud” are often synonymously used, but there are subtle differences. An “outliers’ generally refers to a data point that somehow stands out from the rest of the crowd. However, when this outlier is completely unexpected and unexplained, it becomes an anomaly. That is to say, all anomalies are outliers but not necessarily all outliers are anomalies. In this article, however, I am using these terms interchangeably.

There are numerous reasons why understanding and detecting outliers are important. As a data scientist when we make data preparation we take great care in understanding if there is any data point unexplained, which may have entered erroneously. Sometimes we filter completely legitimate outlier data points and remove them to ensure greater model performance.

There is also a huge industrial application of anomaly detection. Credit card fraud detection is the most cited one but in numerous other cases anomaly detection is an essential part of doing business such as detecting network intrusion, identifying instrument failure, detecting tumor cells etc.

A range of tools and techniques are used to detect outliers and anomalies, from simple statistical techniques to complex machine learning algorithms, depending on the complexity of data and sophistication needed. The purpose of this article is to summarise some simple yet powerful statistical techniques that can be readily used for initial screening of outliers. While complex algorithms can be inevitable to use, sometimes simple techniques are more than enough to serve the purpose.

Below is a primer on five statistical techniques.

#anomaly-detection #machine-learning #outlier-detection #data-science #fraud-detection

Arno  Bradtke

Arno Bradtke

1601334000

Anomaly detection with Local Outlier Factor (LOF)

Today’s article is my 5th in a series of “bite-size” article I am writing on different techniques used for anomaly detection. If you are interested, the following are the previous four articles:

Today I am going beyond statistical techniques and stepping into machine learning algorithms for anomaly detection.

#outlier-detection #fraud-detection #data-science #machine-learning #anomaly-detection

DBSCAN — a density-based unsupervised algorithm for fraud detection

According to a recent report financial losses due to fraudulent transactions have reached about $17 billion USD, with as many as 5% of consumers experiencing fraud incidents of some kind.

In light of such a big volume of financial losses, every industry is taking fraud detection seriously. It’s not just the financial industries that are susceptible, anomalies are prevalent in every single industry and can take many different forms — such as network intrusion, disturbances in business performances and abrupt changes in KPIs etc.

Fraud/anomaly/outlier detection has long been the subject of intense research in data science. In the ever-changing landscape of fraud detection, new tools and techniques are being tested and employed every day to screen out abnormalities. In this series of articles, so far I’ve discussed six different techniques for fraud detection:

Today I’m going to introduce another technique called DBSCAN — short for Density-Based Spatial Clustering of Applications with Noise.

As the name suggests, DBSCAN is a density-based and unsupervised machine learning algorithm. It takes multi-dimensional data as inputs and clusters them according to the model parameters — e.g. epsilon and minimum samples. Based on these parameters, the algorithm determines whether certain values in the dataset are outliers or not.

Below is a simple demonstration in Python programming language.

#fraud-detection #machine-learning #anomaly-detection #outlier-detection #data-science