The ceaseless climb of the world’s skyscrapers is a story of ever-evolving challenges. We live our daily lives around skyscrapers, malls, and warehouses. Most of the time we are clueless about what it takes to construct these buildings. The construction industry is responsible for creating such magnificent masterpieces. With $1.3 trillion annual expenditure (~6.3% of the GDP) [1] and 7.6 million employees (~5% of the total workforce) [2] as of March 2020, construction is one of the largest sectors of the U.S. economy. However, it is also considered as one of the most hazardous industries due to the high construction site fatalities.

The top cause of construction related casualties are falls, people getting stuck in the equipment, electrocution and collisions . The majority of these injuries can be prevented if workers wear appropriate personal protective equipment like hard hats, safety vests, gloves, safety goggles, and steel toe boots.To ensure safety at the workplace, the U.S. Occupational Safety and Health Administration (OSHA) requires that contractors enforce and monitor appropriate usage of personal protective equipment (PPE).

Currently the PPE detection technique can be classified into two types: sensor based and vision based. The sensor based approach uses RFID (Radio Frequency Identification RFID) tags and requires a signiﬁcant investment in purchasing, installing, and maintaining complicated sensor networks. On the contrary, vision-based approaches use cameras to record images or videos of the construction site, which are then analyzed to verify PPE compliance. This approach provides richer information about the scene that can be utilized to understand complex construction sites more promptly, precisely, and comprehensively.

This article explains how a deep learning model built on YOLOv3 (You Only Look Once) architecture can be used to meet various safety requirements like PPE compliance of construction site workers and fire detection using minimal hardware i.e. a surveillance camera. The deep learning algorithm simultaneously detects individual workers and veriﬁes PPE compliance with a single convolutional neural network (CNN) framework. In today’s world, Convolutional Neural Network (CNN) is most widely used for image classiﬁcation and object detection, due to its ability to self-learn from large sets of annotated training data.

Various real time object detection techniques

Any object detection problem in computer vision can be defined as identifying an object (a.k.a., classiﬁcation) in an image and then precisely estimating its location(a.k.a., localization) within the image. Object Detection is used almost everywhere these days. The use cases are endless, be it tracking objects, video surveillance, pedestrian detection, anomaly detection, self-driving cars or face detection.

There are three main object detection algorithms that are currently used in the industry:

R-CNN: Region based Convolutional Network

An R-CNN ﬁrst identiﬁes several regions of interest, and then uses a CNN to classify those regions to detect objects in them. Since the original R-CNN is slow, faster variants of it, e.g. Fast R-CNN, Faster R-CNN, and Mask R-CNN have been proposed. In R-CNN, the image is first divided into approximately 2000 region recommendations (region propotals) and then CNN (ConvNet) is applied for each region respectively.

The size of the regions is determined and the correct region is inserted into the artificial neural network. The biggest problem of this method is time. Since each region in the picture is applied CNN separately, training time is approximately 84 hours and forecast time is approximately 47 sec.

YOLO: You Only Look Once

Most of the object detection algorithms use regions to localize the object within the image. The network does not look at the complete image. Instead, they look at parts of the image which have high probabilities of containing the object. YOLO or You Only Look Once is an object detection algorithm much different from the region based algorithms seen above. In YOLO, a single convolutional network predicts the bounding boxes and the class probabilities for these boxes.

Image for post

A simplified illustration of the YOLO object detector pipeline (source)

The strategy used by YOLO is radically different from earlier detection systems. A single neural network is applied to the whole image. This network splits the picture into regions and calculates bounding boxes for each area and probabilities. Those bounding boxes are weighted by the predicted probabilities.

SSD: Single Shot MultiBox Detector

SSD is a popular algorithm in object detection. It’s generally faster than Faster RCNN. SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. [4]

Image for post

Screenshot from article explaining the SSD Architecture

Now, we will see how we collected data for this project and the implementation of YOLOv3.

Data set Collection

In this project, we first classified the objects that we are detecting into four categories: Helmet (Person wearing helmet), Person( Person without helmet), Fire and Safety vest.

Image for post

Image Labeling with YOLO format

Training data set are images collected as samples and annotated for training deep neural networks. We collected images for our data set from multiple sources like the safety helmet data set that can be found on github, images from google and Flickr and annotating them using labelIMg. Also, for getting custom data with these objects, we used the Flickr API & downloaded the images using python code.For labelling the image, we used the LabelImg tool.

#machine-learning #deep-learning #data-science #object-detection #yolov3 #data analysis

Various real time object detection techniques

R-CNN: Region based Convolutional Network

YOLO: You Only Look Once

SSD: Single Shot MultiBox Detector

Data set Collection

Image Labeling with YOLO format

towardsdatascience.com

Using YOLOv3 for real-time detection of PPE and Fire