Fruit Classification With K-Nearest Neighbors

We will build a simple form of Object Recognition System. Although the example we’ll use is very simple, it reflects many of the same key machine learning concepts that go into building real-world commercial systems.

About the Dataset

The dataset we will use is a small, very simple, for training a classifier to distinguish between distinct types of fruit.

To create the original dataset, we go to a nearby store, bought a few dozen oranges, lemons, and apples of different varieties, and recorded their measurements in a table. We notice the height and the width, estimated their mass.

We’ve formatted data slightly and added one or two extra simulated features such as a color score for instructional purposes. This dataset named “fruit_data.txt”. You can find the dataset in my GitHub repository.

Image for post

A peek of Fruits Dataset

To solve machine learning problems, you can think of the input data as a table. Each object is represented by a row, and the attributes of the object:

Name
Sub Type
Measurement
Color

The features of the fruit are represented by the values you see across the columns.

Import required Libraries

Import these modules below to proceed with the code.

%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split

Import the Dataset

For those who are using Google Colab, use the following code snippet to import the Dataset file.

from google.colab import files
files.upload()

The first thing we will do is to load the fruit dataset file using the very handy read table command in pandas.

fruits = pd.read_table(‘fruit_data.txt’)

Now, this will read the dataset from disk, and store it into a data frame variable we’ll call fruits here.

Output:

Image for post

Here we can see that each row of the dataset represents one piece of fruit as represented by several features are in the table’s columns. So, in order, the columns we see are fruit labels.

Exploratory Data Analysis

Defining a dictionary that takes a numeric fruit label as the input key. And returns a value that’s a string with the name of the fruit, and this dictionary just makes it easier to convert the output of a classifier prediction to something a person can more easily interpret, the name of a fruit in this case.

lookup_fruit_name = dict(zip(fruits.fruit_label.unique(), fruits.fruit_name.unique()))

lookup_fruit_name

Create a mapping from fruit label value to fruit name to make results easier to interpret.

To estimate how well the classifier will do on future samples, split the original dataset into two parts.

X = fruits[['height', 'width', 'mass', 'color_score']]
y = fruits['fruit_label']

We’ll have an array of labeled samples called the training set that will train the classifier.

Then we’ll hold out the remaining labeled samples and put them into a second separate array called the test set that will then evaluate the trained classifier.

#scikit-learn #machine-learning #artificial-intelligence #knn-algorithm #classification-algorithms #deep learning

About the Dataset

Import required Libraries

Import the Dataset

Exploratory Data Analysis

medium.com

Fruit Classification With K-Nearest Neighbors