In our daily lives, repeating actions occur frequently. This ranges from organic cycles such as heartbeats and breathing, through programming and manufacturing, to planetary cycles like day-night rotation and seasons.

The need to recognise these repetitions, like those in videos, is unavoidable and requires a system that can identify and count repetitions. Think exercising — how many repetitions are you doing?

The Unsolved Problem

Isolating repeating actions is a difficult task. I know, it seems pretty straight forward when you see someone in front of you jumping up and down but translating that into the form of a machine learning problem makes it much more difficult. How do you teach a computer what a jumping jack looks like from all 360 degrees? How can you generalise any inference from video?

Previous work in the space took the approach of analysing videos at a fine-grain level using a cycle-consistency constraint across different videos of the same action. Reading the paper of the old model, you can see that you’re basically building a model that compares frames in a collection of videos:

Temporal Cycle-Consistency Learning: [source]

However, in the real world problems are faced such as camera motion, objects in the field that distort the vision, and changes of form of the repeating view: basically trying to calculate features invariant to such noise. The existing process required a lot of work to ‘densely label data’ and it’d be much more ideal if an algorithm could learn a sequence from a single video.

#artificial-intelligence #programming #mathematics #machine-learning #statistics #deep learning

Why the RepNet is so important
1.25 GEEK