As a data scientist at VATBox, I’ve mainly worked on projects which at their core involve building Machine Learning models. What’s nice about this project is that it purely includes building an algorithm to solve the given task. A real-world problem with a custom-made solution.

Problem Definition

The problem in hand is essentially matching between two images which are close to being identical (changes can be due to image size, for example). The goal here is to do so in real-time, therefore we need the algorithm to be relatively fast.

At VATBox our world is a world of invoices. Users upload reports, which contain images of invoices, to our platform. A single report contains two groups of images, one is a collection of separate invoices, and the other is all the collection together in one single file. For reasons which we will not go into, there may be images in one group that won’t appear in the other. The goal is to detect the images which don’t appear in the other group (if there are any). If we match between all the images which are essentially the same image, we can also identify the spare ones.

Image for post

We want to detect the spare images (marked in red on the left and blue on the right)

#image-processing #data-science #algorithms #opencv

Image Matching with OpenCV’s Template Matching
4.05 GEEK