Introduction:

On a lighter note, the embedding of a particular word **(In Higher Dimension) **is nothing but a vector representation of that word **(In Lower Dimension). **Where words with similar meaning **_Ex. “Joyful” and “Cheerful” _**and other closely related words like Ex. “Money” and “Bank”, gets closer vector representation when projected in the Lower Dimension.

The transformation from words to vectors is called word embedding

So the underlying concept in creating a mini word embedding boils down to train a simple Auto-Encoder with some text data.


Some Basics :

Before we proceed to our creation of mini word embedding, it’s good to brush up our basics concepts of word embedding showered by the deep learning community so far.

The popular and state-of-the-art word embedding models out there are as follows:-

  1. Word2Vec (Google)
  2. Glove (Stanford University)

They are trained on a huge amount of text corpus like Wikipedia or entire web is scraped, up to 6 Billion words (In Higher Dimension), and projected them into as low as 100,200,300 dense embeddings (In Lower Dimension).

Here in our model, we project them into 2 dense embeddings.


Techniques used :

The above state-of-the-art models use any one of the 2 primary techniques to accomplish the task.

  1. Continous-Bag-of-Words (CBOW)
  2. Skip-Gram

1. CBOW :

CBOW attempts to guess the output (target word) from its neighboring words (context words). Window size is a hyper-parameter here.

#deep-learning #autoencoder #neural-networks #word-embeddings #machine-learning #deep learning

Create your own Mini-Word-Embedding from Scratch.
1.30 GEEK