I. Motivation

There are many interesting research done in the field of Computer Vision. We have seen amazing advancements in Object Detection, Face Detection, Facial Recognition, Optical Character Recognition (OCR) and many more. While learning about Face Detection, I came across an area of research concerning the estimation of head pose. In this article, I want to share with you what I have learnt. More specifically, I will start by discussing Head Pose Estimation in general. Then, I will discuss about a deep learning approach to Head Pose Estimation introduced by Nataniel Ruiz, Eunji Chong, and James M. Rehg called Hopenet.

Disclaimer:_ This article assumes that you are already familiar with the concept of CNNs in deep learning._

II. Head Pose Estimation

What is Head Pose Estimation?

Image for post

Figure 1: A diagram to illustrate the three Euler angles.

As the name suggests, Head Pose Estimation research in Computer Vision focuses on the prediction of the pose of a human head in an image. More specifically, it concerns the prediction of the Euler angles of a human head. The Euler angles consists of three values: yaw, pitch and roll. These three values describes the rotation of an object in 3D space. By accurately predicting these three values, we can figure out which direction a human head is facing. Having a computer able to figure out which direction a human head is facing provides many useful applications. For instance, it can be used to map a 3D object to match the direction of the human head similar to those seen in TikTok, Snapchat and Instagram filters. In addition, it can also be used in self-driving cars to track whether or not a driver is focusing on the road.

Note:_ You can refer to this Youtube video provided by Udacity for an interactive explanation of Euler angles._

What are the different techniques used to estimate head pose?

Note:_ Many approaches in head pose estimation assume face detection as a preliminary step. First, a face is detected and only then can head pose be estimated._

There are two major approaches used to estimate head pose. One approach involves an intermediate step of estimating facial landmarks. The facial landmarks are then mapped on to a 3D model of a human head. Combining the created 3D landmarks with information of the camera such as the focal length, distortion and optical center in the image, a mathematical formula can be used to calculate the yaw, pitch and roll value of the human head (Mallick, 2016). However, this approach comes with some drawbacks. According to Ruiz, Chong, & Rehg (2018), the performance of this approach relies heavily on the performance of the facial landmark predictions, the representativeness of the 3D head model, and the mathematical model used to complete the estimation.

#hopenet #head-pose-estimation #machine-learning #deep-learning #deep learning

Head Pose Estimation with Hopenet
2.65 GEEK