Well, wondering what is NLTK? the Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania.

The basic task in NLP are:

1.convert text to lower case

2. word tokenize

3. sent tokenize

4. stop words removal

5. lemma

6. stem

7. get word frequency

8. pos tags

9. NER

Pre-requirements:

_install _Python

_install nltk and its _corpus

Examples:

import nltk

import nltk in-order to use its functions

import nltk

2. convert text to lower case:

It is necessary to convert the text to lower case as it is case sensitive.

text = “This is a Demo Text for NLP using NLTK. Full form of NLTK is Natural Language Toolkit”
lower_text = text.lower()
print (lower_text)

[OUTPUT]: this is a demo text for nlp using nltk. full form of nltk is natural language toolkit

3. word tokenize

Tokenize sentences to get the tokens of the text i.e breaking the sentences into words.

text = “This is a Demo Text for NLP using NLTK. Full form of NLTK is Natural Language Toolkit”
word_tokens = nltk.word_tokenize(text)
print (word_tokens)

[OUTPUT]: ['This', 'is', 'a', 'Demo', 'Text', 'for', 'NLP', 'using', 'NLTK', '.', 'Full', 'form', 'of', 'NLTK', 'is', 'Natural', 'Language', 'Toolkit']

#machine-learning #python #naturallanguageprocessing #nlp #nltk