Exploring ways of calculating the distance in hope to find the high-performing solution for large data sets.

Euclidean distance is one of the most commonly used metric, serving as a basis for many machine learning algorithms. However when one is faced with very large data sets, containing multiple features, the simple distance calculation becomes a source of headaches and memory errors.

Although being aware that packages like SciPy provide robust solution, I couldnâ€™t resist to explore other ways of calculating the distance in hope to find the high-performing approach for large data sets.

We begin with quick reminder of the formula, which is quite straightforward. Given two vectors **x** and **y**, we take a square root of the sum of squared differences in their elements.

