Why the RepNet is so important

In our daily lives, repeating actions occur frequently. This ranges from organic cycles such as heartbeats and breathing, through programming and manufacturing, to planetary cycles like day-night rotation and seasons.

The need to recognise these repetitions, like those in videos, is unavoidable and requires a system that can identify and count repetitions. Think exercising — how many repetitions are you doing?

The Unsolved Problem

Isolating repeating actions is a difficult task. I know, it seems pretty straight forward when you see someone in front of you jumping up and down but translating that into the form of a machine learning problem makes it much more difficult. How do you teach a computer what a jumping jack looks like from all 360 degrees? How can you generalise any inference from video?

Previous work in the space took the approach of analysing videos at a fine-grain level using a cycle-consistency constraint across different videos of the same action. Reading the paper of the old model, you can see that you’re basically building a model that compares frames in a collection of videos:

Temporal Cycle-Consistency Learning: [source]

However, in the real world problems are faced such as camera motion, objects in the field that distort the vision, and changes of form of the repeating view: basically trying to calculate features invariant to such noise. The existing process required a lot of work to ‘densely label data’ and it’d be much more ideal if an algorithm could learn a sequence from a single video.

#artificial-intelligence #programming #mathematics #machine-learning #statistics #deep learning

The Unsolved Problem

towardsdatascience.com

Why the RepNet is so important