Some of the popular techniques for counting fingers in a image are by training a CNN or using contours and convexity hull. I have actually tried both of these techniques and in this section would like to mention the challenges that I faced with these techniques(putting links to check work done).

Experiments and challenges

  1. _CNN Approach: _The model is able to achieve good training and validation accuracy. The final plot also looks good. But when it comes to detection in real-life images, the model fails very badly. I tried tuning the hyperparameters, applying data augmentation, transfer learning, learning rate decay, tuning model architecture but alas no improvements. The main reason why the model fails on real-life images is that the training and testing images are very similar and over-simplified, so the model tries to overfit and learn quickly. Check my attempts 1 and 2 (feel free to suggest any changes that can help)
  2. Contours and Convexity Hull: This approach performs quite better as compared to the first. The detections are very quick and with good confidence. The only challenge here is that you have to take care of the background, doesn’t work with the crowded background.

So after so much investigation, I encountered the Hands module of Mediapipe library which surprisingly performed very well and didn’t have any challenges that I faced above plus it’s super easy to implement, needs no GPU. This article is in continuation to my [previously written article_]( about Mediapipe. I strongly recommend going through it before starting this one_.

finger mediapipe computer-vision gesture-recognition opencv

