This is a quick and straight to the point introduction to Euclidean distance and cosine similarity with a focus on NLP.


Euclidean Distance

The Euclidean distance metric allows you to identify how far two points or two vectors are apart from each other.

Now suppose you are a high school student and you have three classes. A math class, a philosophy class, and a psychology class. You want to check the similarity between these classes based on the words your professors use in class. For the sake of simplicity, let’s consider these two words: “theory” and “harmony”. You could then create a table like this to record the occurrence of these words in each class:

Image for post

In this table, the word “theory” is repeated 60 times in math class, 20 times in philosophy class, and 25 times in psychology class whereas the word harmony is repeated 10, 40, and 70 times in math, philosophy, and psychology classes respectively. Let’s translate this data into a 2D plane.

Word vectors in 2D plane

The Euclidean distance is simply the distance between the points. In the graph below.

Image for post

#cosine-similarity #nlp #python #machine-learning #euclidean-distance

Euclidean Distance and Cosine Similarity. Which One to Use and When?
7.35 GEEK