K-Nearest Neighbors Algorithm

If you’re familiar with machine learning and the basic algorithms that are used in the field, then you’ve probably heard of the K-Nearest Neighbors algorithm or KNN. This algorithm is one of the most simple techniques used in machine learning. It is a method preferred by many in the data industry because of its ease of use and low calculation time.

What is KNN? K-Nearest Neighbors (KNN) is a model that classifies data points based on the points that are most similar to it. It uses test data to make an “educated guess” on what an unclassified point should be classified as.

Pros:

  1. Easy to use.
  2. Quick calculation time.
  3. Does not make assumptions about the data.

Cons:

  1. Accuracy depends on the quality of the data.
  2. Must find an optimal k value (number of nearest neighbors).
  3. Poor at classifying data points in a boundary where they can be classified one way or another.

KNN is an algorithm that is considered both non-parametric and an example of lazy learning. What do these two terms mean exactly?

  • Non-parametric means that it makes no assumptions. The model is made up entirely of the data given to it rather than assuming its structure is normal.
  • Lazy learning means that the algorithm makes no generalizations. This means that there is little training involved when using this method. Because of this, all of the training data is also used in testing when using KNN.

Where to use KNN

KNN is often used in simple recommendation systems, image recognition technology, and decision-making models. It is the algorithm companies like Netflix or Amazon use in order to recommend different movies to watch or books to buy. Netflix even launched the Netflix Prize competition, awarding $1 million to the team that created the most accurate recommendation algorithm!

You might be wondering, ‘But how do these companies do this?’

Well, these companies will apply KNN on a data set gathered about the movies you’ve watched or the books you’ve bought on their website. These companies will then input your available customer data and compare that to other customers who have watched similar movies or bought similar books. This data point will then be classified as a certain profile based on their past using KNN. The movies and books recommended will then depend on how the algorithm classifies that data point.

#programming #data-science #python #algorithms #machine-learning

Machine Learning — K-Nearest Neighbors algorithm with Python
1.20 GEEK