It is one of the simplest supervise learning techniques. If there is a new data point that we need to classify then we will choose k “closest” data points (nearest neighbors) that will be around the new data point for performing classification or regression. K can be any positive number.
KNN is a Non-parametric algorithm i.e. it does not make any underlying assumptions about the distribution of data. It is used for both classifications and regression problems.
Example: We have two categorical variables red and black, and we have to classify a new data point X whether it belongs to red or black.
We select K=5 and select five close data points near to X. As you can see in the above figure in 5 data point 3 is black and 2 is red since black in a majority in number the new data point X will belong to a black class.
**KNN for Regression: **When KNN is used for regression problems the prediction is based on the mean or the median of the nearest neighbor.
**KNN for Classification: **Inclassification, the output can be calculated as the class with the highest occurrence from the K nearest neighbor.
The KNN algorithm work with the below steps:
Suppose we have a new data point and we need to put the new data point in the correct category. Consider the below image:
We can any of the three functions to calculate the distance between the data point to find the closest data point. In this example, we will use the Euclidean distance.
#knn #data-science #knn-algorithm #algorithms