Not all the data is present in a standardized form. Data is created when we talk, when we tweet, when we send messages on Whatsapp and in other activities. The majority of this data is in the textual form, which is of a highly unstructured nature.

While having data of high dimensions, the information stored therein is not directly available unless it is manually interpreted (understand by reading) or analyzed by an automated device. It is important to familiarise ourselves with the techniques and principles of Natural Language Processing ( NLP) in order to gain meaningful insights from text data.

So in this article, we will see how we can gain insights into text data and hands-on on how to use those insights to train NLP models and perform some human mimicking tasks. Let’s dive in and look at some of the basics of NLP.

Tokenization:

Representing the words in a way that a computer can process them, with a view to later training a Neural network that can understand their meaning. This process is called tokenization.

Let’s look at how we can tokenize the sentences using TensorFlow tools.

#python #deep-learning #nlp #tensorflow #machine-learning

Learning and Practicing Natural Language Processing with TensorFlow
2.50 GEEK