KMeans clustering is one of the most used unsupervised machine learning algorithms. As the name suggests, it can be used to create clusters of data, essentially segregating them.

Let’s get started. Here I will take a simple example to separate images from a folder that has both images of cats and dogs to their own clusters. This will create two separate folders (clusters). We will also go through how to automatically determine the optimal value for K.

I have generated a dataset of images of cats and dogs.

Image for post

Images of Cats and Dogs.

First off, we will start by importing the required libraries.

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import cv2
import os, glob, shutil

Then we will read all the images from the images folder and process them to extract for feature extraction. We will resize images to 224x224 to match the size of the input layer of our model for feature extraction.

input_dir = 'pets'
glob_dir = input_dir + '/*.jpg'

images = [cv2.resize(cv2.imread(file), (224, 224)) for file in glob.glob(glob_dir)]
paths = [file for file in glob.glob(glob_dir)]
images = np.array(np.float32(images).reshape(len(images), -1)/255)

Now we will do feature extraction with the help of MobileNetV2 (Transfer Learning). Why MobileNetV2? You may ask. We can use ResNet50, InceptionV3, etc. but MobileNetV2 is fast and not so resource heavy so that’s my choice here.

model = tf.keras.applications.MobileNetV2(include_top=False,
weights=’imagenet’, input_shape=(224, 224, 3))

predictions = model.predict(images.reshape(-1, 224, 224, 3))
pred_images = predictions.reshape(images.shape[0], -1)

Now that we have extracted the features, we can now do clustering by using KMeans. Since we already know that we are separating images of cats and dogs, with know the

k = 2
kmodel = KMeans(n_clusters = k, n_jobs=-1, random_state=728)
kmodel.fit(pred_images)
kpredictions = kmodel.predict(pred_images)
shutil.rmtree(‘output’)
for i in range(k):
    os.makedirs(“output\cluster” + str(i))
for i in range(len(paths)):
    shutil.copy2(paths[i], “output\cluster”+str(kpredictions[i]))

#image-clustering #artificial-intelligence #machine-learning #transfer-learning #k-means #deep learning

Using K-Means Clustering for Image Segregation
2.95 GEEK