Improve person re-identification with face detection (FaceBoxes)

Person re-identification is an interesting and not completely solved task. It includes finding (localizing) a person in an image and creating a digital description (vector or embedding) for a photo of a particular person in a way that the distance to the vectors for other photos of a particular person is closer than to the vectors generated for photos of other people.

Image for post

Person re-identification is used in many tasks including visitor flow analysis in a shopping center, tracking people across cameras, finding a certain person in a huge amount of photos.

Many effective models and approaches have been created recently to address the re-identification tasks. Full list of those models can be found here. But even the best models are still faced with a lot of problems, such as variations in pose and viewpoints of people because of which the embeddings for a photo of a person from different angles will be too far from each other, and the system can decide that this is a photo of different people.

The latest state-of-the-art models, such as Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification, are designed to deal with mentioned problems, but we at ai-labs.org came up with a light approach that greatly simplifies the task of re-identification in some situations. I will talk about this approach in more detail.

Let’s start by explaining how most of the re-id frameworks detect photos of a particular person in the image. The most commonly used object detection models, such as Faster R-CNN or EfficientDet, are used to create a bounding box for the entire human body. After a photo for the entire human body is extracted, the embedding for this photo will be created.

Image for post

The problem is that object detection models often work even too well, they find photos of people from a variety of viewpoints and not always of the best quality. Embeddings based on these photos often do not allow correct re-identification of a person or are generated in such a way that embeddings for a photo of a particular person from one viewpoint will be close to embeddings for photos only from the same viewpoint, but not to embeddings for photos of the same person from a different viewpoint and distance.

#computer-vision #deep-learning #machine-learning #ai #deep learning

towardsdatascience.com

Improve person re-identification with face detection (FaceBoxes)