In this article, we will talk about the failure cases of KNN, its limitation, and how to pick the right value of “K”.

In case you missed the first part, I strongly suggest you go through it first, here.

Failure cases of KNN:


Consider the below example:

Image for post

In this case, the data is randomly spread and hence no useful information can be obtained from it. Now in such a scenario when we are given a query point (yellow point), the KNN algorithm will try to find the k nearest neighbors but since the data points are jumbled, the accuracy is questionable.


Consider the below example:

Image for post

In this case, the data is grouped in clusters but the query point seems far away from the actual grouping. In such a case, we can use K nearest neighbors to identify the class, however, it doesn’t make much sense because the query point (yellow point) is really far from the data points and hence we can’t be very sure about its classification.

Limitations of KNN:

KNN is a very powerful algorithm. It is also called “lazy learner”. However, it has the following set of limitations:

1. Doesn’t work well with a large dataset:

Since KNN is a distance-based algorithm, the cost of calculating distance between a new point and each existing point is very high which in turn degrades the performance of the algorithm.

2. Doesn’t work well with a high number of dimensions:

Again, the same reason as above. In higher dimensional space, the cost to calculate distance becomes expensive and hence impacts the performance.

3. Sensitive to outliers and missing values:

KNN is sensitive to outliers and missing values and hence we first need to impute the missing values and get rid of the outliers before applying the KNN algorithm.

#machine-learning #knn-algorithm #data-science #algorithms

KNN: Failure cases, Limitations and Strategy to pick right K
11.60 GEEK