Reading, comprehending, communicating and ultimately producing new content is something we all do regardless of who we are in our professional lives.

When it comes to extracting useful features from a given body of text, the processes involved are fundamentally different when compared to, say a vector of continuous integers. This is because the information in a sentence or a piece of text is encoded in structured sequences, with the semantic placement of words conveying the meaning of the text.

So this dual requirement of appropriate representation of the data along with preserving the contextual meaning of the text has led me to learn about and implement 2 different NLP models to achieve the task of text classification.

Word Embeddings are dense representations of the individual words in a text, taking into account the context and other surrounding words that that individual word occurs with.

The dimensions of this real-valued vector can be chosen and the semantic relationships between words are captured more effectively than a simple Bag-of-Words Model.

#keras #nlp #data-science #machine-learning #questions

A Guide to Word Embeddings
1.30 GEEK