CIFAR 10 Data set using logistic regression

Image for post

In my previous posts we have gone through

  1. Deep Learning — Artificial Neural Network(ANN)
  2. Tensors — Basics of pytorch programming
  3. Linear Regression with PyTorch

Let us try to solve image classification of CIFAR-10 data set with Logistic regression.


Step 1 : Import necessary libraries & Explore the data set

We are importing the necessary libraries pandas , numpy , matplotlib ,torch ,torchvision. With basic EDA we could infer that CIFAR-10 data set contains 10 classes of image, with training data set size of 50000 images , test data set size of 10000.Each image is of [3 x 32 x 32 ]. Which represents 3 channels RGB,32 x 32 pixel size.

#Explore CIFAR Data set
dataset = CIFAR10(root='data/', download=True, transform=ToTensor())
test_dataset = CIFAR10(root='data/', train=False, transform=ToTensor())
#size of training data
dataset_size = len(dataset)
dataset_size
#size of test data
test_dataset_size = len(test_dataset)
test_dataset_size
#number of classes in the data set
classes = dataset.classes
classes

Visualizing a sample image and the size of the sample image.

#Let us understand the size of one image
img, label = dataset[0]
img_shape = img.shape
img_shape
#Let us look at a sample image
img, label = dataset[0]
plt.imshow(img.permute((1, 2, 0)))
print('Label (numeric):', label)
print('Label (textual):', classes[label])

As this is a 3 channel RGB image Pytorch expects the channels as first dimension where as matplotlib expects as last dimension of the image.Here .permute tesnor method is used to shift channels to last dimesnion

Image for post

Image for post

Label (numeric): 6

Label (textual): frog

Step 2 : Prepare data for training

We using training set , validation set , Test set. Why we need them ?

Training set : used to train our model,computing loss & adjust weights Validation set : To evaluate the model with hyper parameters & pick the best model during training. we are using 10% of training data as validation set Test data set : Used to compare different models & report the final accuracy.

We are using the random_split from pytorch for creating train_ds,val_ds. torch.manual_seed(43) is set for reproducing the results.

#validation set size 5000 ie 10% 
torch.manual_seed(43)
val_size = 5000
train_size = len(dataset) - val_size
#creating training & validation set using random_split
train_ds, val_ds = random_split(dataset, [train_size, val_size])
len(train_ds), len(val_ds)

We are using the data loader as we used in our previous example , with a batch size of 128. To visualize our data we are using make_grid helper function from torch vision.

#pytorch #neural-networks #image-classification #deep-learning #logistic-regression #deep learning

Image classification with PyTorch
11.15 GEEK