Today, lets discuss about one of the simplest algorithms in machine learning: The K Nearest Neighbor Algorithm(KNN). In this article, I will explain the basic concept of KNN algorithm and how to implement a machine learning model using KNN in Python.
Machine learning algorithms can be broadly classified into two:
1. Supervised Learning
2. Unsupervised Learning
In supervised learning, we train our models on a labelled set of data and ask it to predict the label for an unlabeled point. For example, a cancer prediction model is trained on many clinical test results that are labelled as either positive or negative. The trained model can then predict whether an unlabeled test result is positive or negative.
On the other hand, unsupervised learning is done on unlabeled data set.
KNN belongs to supervised learning. Having said that, we will keep aside machine learning and KNN for the next 5 mins and lets go to Bob’s new fruit store in town. As a promotional offer, Bob has set up a puzzle and is giving free fruits to those who solve it. The puzzle is as explained below.
Bob’s puzzle
Bob has arranged 4 types of fruits: Apples, Oranges, Strawberries and Grapes as shown in above image. He has covered some of the fruits with a black cloth and numbered them. Whoever can correctly predict the name of these hidden fruits can take them for free! But making a wrong prediction will make you lose all you have got. If you are unsure, you can mark it as unsure. Now take a PAUSE and try if you can get it all right.
Once you have made the predictions, lets cross check it with that of below:
1,2,3 → Oranges
4 →Unsure if its Orange or Apple
5,6 →Apple
7 →Strawberry
8 →Unsure if its Grape or Strawberry
9 →Grape
10 →Unsure if its Apple or Grape
If your predictions matches with that of above, you already know what is KNN and have implemented it! Wonder how? To understand that, lets have a closer look at how you made the predictions.
In the image, we observe that similar fruits are arranged together. For 1, 2 and 3, we can easily classify them as Oranges since they are densely surrounded by Oranges alone and thus there is a high probability that the hidden ones could also be Oranges. To put it in other words, the hidden ones will mostly be the same type as that of majority of their neighbors. The same logic applies for 5,6,7 and 9.
For 10, we are unsure if its an Apple or a Grape. This is because, its surrounded by both Apples and Grapes. Or we can say that neighbors of 10 belongs to Apples and Grapes. So 10 can be either an Apple or a Grape. Similarly for 4 and 8.
In short, KNN algorithm predicts the label for a new point based on the label of its neighbors. KNN rely on the assumption that similar data points lie closer in spatial coordinates. In above example, based on the label(Apples, Oranges, Strawberries, Grapes) of the neighbors we can predict the label for a new data point(hidden fruit).
#machine-learning #python #developer