In this article, we will talk about the failure cases of KNN, its limitation, and how to pick the right value of “K”.
In case you missed the first part, I strongly suggest you go through it first, here.
Consider the below example:
In this case, the data is randomly spread and hence no useful information can be obtained from it. Now in such a scenario when we are given a query point (yellow point), the KNN algorithm will try to find the k nearest neighbors but since the data points are jumbled, the accuracy is questionable.
Consider the below example:
KNN is a very powerful algorithm. It is also called “lazy learner”. However, it has the following set of limitations:
1. Doesn’t work well with a large dataset:
Since KNN is a distance-based algorithm, the cost of calculating distance between a new point and each existing point is very high which in turn degrades the performance of the algorithm.
2. Doesn’t work well with a high number of dimensions:
Again, the same reason as above. In higher dimensional space, the cost to calculate distance becomes expensive and hence impacts the performance.
3. Sensitive to outliers and missing values:
KNN is sensitive to outliers and missing values and hence we first need to impute the missing values and get rid of the outliers before applying the KNN algorithm.
#machine-learning #knn-algorithm #data-science #algorithms