This post is to provide you a fundamental idea about the detection of one of the very common forgery techniques i.e., Copy Move Forgery using clustering.
For complete code of the following approach:
_Github: _https://github.com/Himj266/DBSCAN-Copy-Move-Foregry-Detection
_Kaggle: _https://www.kaggle.com/himj26/copy-move-forgery-detection-dbscan-clustering
Let’s start with brief info about copy-move forgery. Copy Move Forgery is basically cloning/copying a part of the image and then moving it to the other location to hide some details or to produce some fake information.
Images are tampered to generate fake proofs to manipulate the perception of the public and also people are not good at recognizing when an image has been manipulated, even if the change is fairly substantial. This type of forgery is very common today and forged images can be found easily on Facebook, Instagram,📸 and other platforms.
So, Why are we talking about SIFT and what the heck is SIFT?
To detect objects in the images we need features of that object to extract meaningful details of an image. But sometimes, images may get scaled, rotated, illuminated or there is a change in viewpoint.
That means we need an algorithm to extract features of objects in a way that these features should be equal/same or approximately the same even if the object is scaled, rotated, or present at a different place.
The Sift algorithm comes in play here. It can detect features that are scale, rotation, noise invariant. The features extracted with the help of the SIFT algorithm will be able to identify the objects in the image and the features extracted are scale, rotation, noise invariant.
So, how does SIFT work?
The SIFT algorithm will find some “interesting” keypoints(like nose, eyes) in the image by using edge/corner detection techniques and thresholding. Then for each keypoint, it will produce a descriptor i.e a feature vector in a 128 dimension space which is generally a vector with 128 values.
How these keypoints are found and their descriptor is calculated is the beauty of the SIFT algorithm. You can read the complete mechanism of SIFT on this wonderful site.
The circles marked in the above image are “interesting” keypoints detected by the SIFT algorithm (Well we might not find them interesting but SIFT does and its SIFT’s choice😝) and for each circle, we have a descriptor.
#Code For SIFT in python using OpenCV
def siftDetector(image):
sift = cv2.xfeatures2d.SIFT_create()
gray= cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
key_points,descriptors = sift.detectAndCompute(gray, None)
return key_points,descriptors
Sift Detection in python
Ok, now we have keypoints and descriptors but what to do with them?
Since a part of the image is copied to another position their feature descriptors must be equal or approximately equal and this is what we will use to detect the forgery. This is the basic idea behind many key-point based copy-move forgery detection(CMFD) techniques.
There are many approaches suggested in many papers to compare and identify similar descriptors. Here a clustering(DBSCAN) approach is presented.
#clustering #python #dbscan #sift #image-processing