This was my first Kaggle notebook and I thought why not write it on Medium too?

Full code on my Github.

In this post, I will elaborate on how to use fastText and GloVe as word embedding on LSTM model for text classification. I got interested in Word Embedding while doing my paper on Natural Language Generation. It showed that embedding matrix for the weight on embedding layer improved the performance of the model. But since it was NLG, the measurement was objective. And I only used fastText too. So in this article, I want to see how each method (with fastText and GloVe and without) affects to the prediction. On my Github code, I also compare the result with CNN. The dataset that i use here is from one of competition on Kaggle, consisted of tweets and labelled with whether the tweet is using disastrous words to inform a real disaster or merely just used it metaphorically. Honestly, on first seeing this dataset, I immediately thought about BERT and its ability to understand way better than what I proposed on this article (further reading on BERT).

But anyway, in this article I will focus on fastText and GloVe.

Let’s go?


Data + Pre-Processing

The data consisted of 7613 tweets (columns Text) with label (column Target) whether they were talking about a real disaster or not. With 3271 rows informing real disaster and 4342 rows informing not real disaster. The data shared on kaggle competition, and if you want to learn more about the data you can read it here.

Image for post

Example of real disaster word in a text :

“ Forest fire near La Ronge Sask. Canada “

Example of the use of disaster word but not about disaster:

“These boxes are ready to explodeExploding Kittens finally arrived! gameofkittens #explodingkittens”

The data will be divided for training (6090 rows) and testing (1523 rows) then proceed to pre-processing. We will only be using the text and target columns.

#data-science #lstm #word-embeddings #nlp #text-classification #data analysis

Text Classification on Disaster Tweets with LSTM and Word Embedding
3.10 GEEK