This is a quick and straight to the point introduction to Euclidean distance and cosine similarity with a focus on NLP.
The Euclidean distance metric allows you to identify how far two points or two vectors are apart from each other.
Now suppose you are a high school student and you have three classes. A math class, a philosophy class, and a psychology class. You want to check the similarity between these classes based on the words your professors use in class. For the sake of simplicity, let’s consider these two words: “theory” and “harmony”. You could then create a table like this to record the occurrence of these words in each class:
In this table, the word “theory” is repeated 60 times in math class, 20 times in philosophy class, and 25 times in psychology class whereas the word harmony is repeated 10, 40, and 70 times in math, philosophy, and psychology classes respectively. Let’s translate this data into a 2D plane.
The Euclidean distance is simply the distance between the points. In the graph below.
#cosine-similarity #nlp #python #machine-learning #euclidean-distance