Natural Language Processing with Python, Scikit-Learn

What is Natural Language Processing?

Natural Language Processing(NLP) refers to developing an application that understands human languages. There are so many use cases for NLPs nowadays. Because people are generating thousands of gigabytes of text data every day through blogs, social media comments, product reviews, news archives, official reports, and many more. Search Engines are the biggest example of NLPs. I don’t think you will find very many people around you who never used search engines.

Project Overview

In my experience, the best way to learn is by doing a project. I also provided the link of some more NLP projects at the bottom of this page. In this article, I will explain NLP with a real project. The dataset I will use is called ‘people_wiki.csv’. I found this dataset in Kaggle. **Feel free to download the dataset **from here. The dataset contains the name of some famous people, their Wikipedia URL, and the text of their Wikipedia page. So, the dataset is very big. The goal of this project is, to find people of related backgrounds. In the end, if you provide the algorithm a name of a famous person, it will return the name of a predefined number of people who have a similar background according to the Wikipedia information. Sounds a bit similar to a search engine, right?

Step By Step Implementation

Import the necessary packages and the dataset. I will explain the functionality of the functions while using them. The focus of this article will primarily be on the use of the functions.

import numpy as np
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_csv('people_wiki.csv')
df.head()

Image for post

#data-science #artificial-intelligence #naturallanguageprocessing #towards-data-science #machine-learning #python

towardsdatascience.com

Natural Language Processing with Python, Scikit-Learn