How to Implement a YOLO (v3) Object Detector using PyTorch (Part 5)

Part 5 of the tutorial series on how to implement a YOLO v3 object detector from scratch using PyTorch.

This is Part 5 of the tutorial on implementing a YOLO v3 detector from scratch. In the last part, we implemented a function to transform the output of the network into detection predictions. With a working detector at hand, all that’s left is to create input and output pipelines.

The code for this tutorial is designed to run on Python 3.5, and PyTorch 0.4. It can be found in it’s entirety at this Github repo.

This tutorial is broken into 5 parts:

Part 1 : Understanding How YOLO works
Part 2 : Creating the layers of the network architecture
Part 3 : Implementing the the forward pass of the network
Part 4 : Confidence Thresholding and Non-maximum Suppression
Part 5 (This one): Designing the input and the output pipelines

Prerequisites

Part 1-4 of the tutorial.
Basic working knowledge of PyTorch, including how to create custom architectures with nn.Module, nn.Sequential and torch.nn.parameter classes.
Basic knowledge of OpenCV

EDIT: If you’ve visited this post earlier than 30/03/2018, the way we resized an arbitarily sized image to Darknet’s input size was by simply rescaling the dimensions. However, in the original implementation, an image is resized keeping the aspect ratio intact, and padding the left-out portions. For example, if we were to resize a 1900 x 1280 image to 416 x 415, the resized image would look like this.

DarknetResizedImage

This difference in preparing the input caused the earlier implementation to have marginally inferior performance to the original. However, the post has been updated to incorporate the resizing metho followed in the original implementation

In this part we will build the input and the output pipelines of our detector. This involves the reading images off the disk, making a prediction, using the prediction to draw bounding boxes on images, and then saving them to the disk. We will also cover how to have the detector work in real time on a camera feed or a video. We will introduce some command line flags to allow some experimentation with various hyperparamters of the network. So let’s begin.

Note: You will need to install OpenCV 3 for this part.

Create a file detector.py in tour detector file. Add neccasary imports at top of it.

from __future__ import division
import time
import torch 
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import cv2 
from util import *
import argparse
import os 
import os.path as osp
from darknet import Darknet
import pickle as pkl
import pandas as pd
import random

Creating Command Line Arguments

Since detector.py is the file that we will execute to run our detector, it’s nice to have command line arguments we can pass to it. I’ve used python’s ArgParse module to do that.

def arg_parse():
    """
    Parse arguements to the detect module

    """

    parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')

    parser.add_argument("--images", dest = 'images', help = 
                        "Image / Directory containing images to perform detection upon",
                        default = "imgs", type = str)
    parser.add_argument("--det", dest = 'det', help = 
                        "Image / Directory to store detections to",
                        default = "det", type = str)
    parser.add_argument("--bs", dest = "bs", help = "Batch size", default = 1)
    parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.5)
    parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
    parser.add_argument("--cfg", dest = 'cfgfile', help = 
                        "Config file",
                        default = "cfg/yolov3.cfg", type = str)
    parser.add_argument("--weights", dest = 'weightsfile', help = 
                        "weightsfile",
                        default = "yolov3.weights", type = str)
    parser.add_argument("--reso", dest = 'reso', help = 
                        "Input resolution of the network. Increase to increase accuracy. Decrease to increase speed",
                        default = "416", type = str)

    return parser.parse_args()

args = arg_parse()
images = args.images
batch_size = int(args.bs)
confidence = float(args.confidence)
nms_thesh = float(args.nms_thresh)
start = 0
CUDA = torch.cuda.is_available()

Amongst these, important flags are images (used to specify the input image or directory of images), det (Directory to save detections to), reso (Input image’s resolution, can be used for speed - accuracy tradeoff), cfg (alternative configuration file) and weightfile.

Loading the Network

Download the file coco.names from here, a file that contains the names of the objects in the COCO dataset. Create a folder data in your detector directory. Equivalently, if you’re on linux you can type.

mkdir data
cd data
wget https://raw.githubusercontent.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/master/data/coco.names

Then, we load the class file in our program.

num_classes = 80    #For COCO
classes = load_classes("data/coco.names")

load_classes is a function defined in util.py that returns a dictionary which maps the index of every class to a string of it’s name.

def load_classes(namesfile):
    fp = open(namesfile, "r")
    names = fp.read().split("\n")[:-1]
    return names

Initialize the network and load weights.

#Set up the neural network
print("Loading network.....")
model = Darknet(args.cfgfile)
model.load_weights(args.weightsfile)
print("Network successfully loaded")

model.net_info["height"] = args.reso
inp_dim = int(model.net_info["height"])
assert inp_dim % 32 == 0 
assert inp_dim > 32

#If there's a GPU availible, put the model on GPU
if CUDA:
    model.cuda()

#Set the model in evaluation mode
model.eval()

Read the Input images

Read the image from the disk, or the images from a directory. The paths of the image/images are stored in a list called imlist.

read_dir = time.time()
#Detection phase
try:
    imlist = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images)]
except NotADirectoryError:
    imlist = []
    imlist.append(osp.join(osp.realpath('.'), images))
except FileNotFoundError:
    print ("No file or directory with the name {}".format(images))
    exit()

read_dir is a checkpoint used to measure time. (We will encounter several of these)

If the directory to save the detections, defined by the det flag, doesn’t exist, create it.

if not os.path.exists(args.det):
    os.makedirs(args.det)

We will use OpenCV to load the images.

load_batch = time.time()
loaded_ims = [cv2.imread(x) for x in imlist]

load_batch is again a checkpoint.

OpenCV loads an image as an numpy array, with BGR as the order of the color channels. PyTorch’s image input format is (Batches x Channels x Height x Width), with the channel order being RGB. Therefore, we write the function prep_image in util.py to transform the numpy array into PyTorch’s input format.

Before we can write this function, we must write a function letterbox_image that resizes our image, keeping the aspect ratio consistent, and padding the left out areas with the color (128,128,128)

def letterbox_image(img, inp_dim):
    '''resize image with unchanged aspect ratio using padding'''
    img_w, img_h = img.shape[1], img.shape[0]
    w, h = inp_dim
    new_w = int(img_w * min(w/img_w, h/img_h))
    new_h = int(img_h * min(w/img_w, h/img_h))
    resized_image = cv2.resize(img, (new_w,new_h), interpolation = cv2.INTER_CUBIC)

    canvas = np.full((inp_dim[1], inp_dim[0], 3), 128)

    canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,  :] = resized_image

    return canvas

Now, we write the function that takes a OpenCV images and converts it to the input of our network.

def prep_image(img, inp_dim):
    """
    Prepare image for inputting to the neural network. 

    Returns a Variable 
    """

    img = cv2.resize(img, (inp_dim, inp_dim))
    img = img[:,:,::-1].transpose((2,0,1)).copy()
    img = torch.from_numpy(img).float().div(255.0).unsqueeze(0)
    return img

In addition to the transformed image, we also maintain a list of original images, and im_dim_list, a list containing the dimensions of the original images.

#PyTorch Variables for images
im_batches = list(map(prep_image, loaded_ims, [inp_dim for x in range(len(imlist))]))

#List containing dimensions of original images
im_dim_list = [(x.shape[1], x.shape[0]) for x in loaded_ims]
im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)

if CUDA:
    im_dim_list = im_dim_list.cuda()

#pytorch #python #deep-learning #data-science #developer

Prerequisites

Creating Command Line Arguments

Loading the Network

Read the Input images

blog.paperspace.com

How to Implement a YOLO (v3) Object Detector using PyTorch (Part 5)