Introduction

k Nearest Neighbors (kNN) is a simple ML algorithm for classification and regression. Scikit-learn features both versions with a very simple API, making it popular in machine learning courses. There is one issue with it — it’s quite slow! But don’t worry, we can make it work for bigger datasets with the Facebook faiss library.

The kNN algorithm has to find the nearest neighbors in the training set for the sample being classified. As the dimensionality (number of features) of the data increases, the time needed to find nearest neighbors rises very quickly. To speed up prediction, in the training phase (.fit() method) kNN classifiers create data structures to keep the training dataset in a more organized way, that will help with nearest neighbor searches.

#algorithms #knn #machine-learning #data-science #k-nearest-neighbours

Make kNN 300 times faster than Scikit-learn’s in 20 lines!
8.40 GEEK