In this post, we will learn the step-by-step procedures on how to preprocess and prepare image datasets to extract quantifiable features that can be used for a machine learning algorithm.

Let’s begin.

As usual, we import libraries such as numpypandas, and matplotlib. Additionally, we import specific functions from the skimage and sklearn library.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from glob import glob
from import imread, imshow
from skimage.color import rgb2gray
from skimage.measure import label, regionprops, regionprops_table
from skimage.filters import threshold_otsu
from skimage.morphology import area_closing, area_opening
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

Our objective is to extract information from each sample that can be used for our machine learning algorithm. Let’s tackle this step-by-step!

STEP 1: Perform Exploratory Data Analysis (EDA)

EDA is always an essential part when building a machine learning algorithm. First and foremost, you should be able to familiarize yourself with the data — its structure, its formatting, and its nuances. This is crucial to ensure that the project methodology that you will conceptualize will be appropriate for the dataset at hand.

For this project, the dataset that we will be using is a collection of dried plant leaves specimens in white background (Image Use Permission Granted by Gino Borja, AIM). There are three classes of plant leaves in this dataset — such as plantA, plantB, and plantC.

filepaths = glob('dataset/*.jpg')
fig, axis = plt.subplots(1,len(filepaths), figsize=(16,8))
for x, ax in zip(filepaths, axis.flatten()):
print("The shape of the image is:", imread(filepaths[0]).shape)
>> The shape of the image is: (876, 637, 3)

Image for post

(Original Image by Gino Borja, AIM)

#machine-learning #image-processing #data-science #python

Image Processing with Python: Applications in Machine Learning
10.40 GEEK