Text Classification with TensorFlow Keras, NLP Using Embedding and LSTM Recurrent Neural Networks

In this video I’m creating a baseline NLP model for Text Classification with the help of Embedding and LSTM layers from TensorFlow’s high-level API Keras.

00:00 NLP with TensorFlow
00:48 How to clean text data for machine learning
01:56 How to count the occurences of each word in a corpus
03:40 Why we need to define the sequence length for NLP Projects with Tensorflow
04:00 How to split the dataset into a train and test set
04:42 How to use Tokenizer from Keras to index words and transform text to sequences
05:49 How to pad text sequences to have a specific length for NLP Projects with Tensorflow
08:15 LSTM Model for NLP Projects with Tensorflow
08:25 Understanding Embedding and why we need to use it for NLP Projects

With Embedding, we map each word to a vector of fixed size with real-valued elements. In contrast to one hot encoding, we can use finite sized vectors to represent an infinite number of real numbers.

This feature learning technique can learn the most important features to represent the words in the data.

LSTMs are Recurrent Neural Networks (RNN) used for modeling sequences. LSTM units have a memory cell as the building block and it represents the hidden layer. In an LSTM cell there are three different types of gates: the forget gate, the input gate and the output gate.

The most important one, the forget gate allows the LSTM memory cell to reset the cell state. The forget gate decides which information is allowed to go through and which to hold back.

You can access the Jupyter notebook here (login required):

https://www.decisionforest.com/downlo…

Subscribe: https://www.youtube.com/c/DecisionForest/featured

#python #nlp #tensorflow