We are trying to render a 3D animation of a person by tracking their motion from 2D video.
Animating a person in 3D graphics requires a huge set up with motion trackers to track the person’s movements and also takes time to animate each limb manually. We aim to provide a time-saving method to do the same.
Our solution to this problem involves the following steps:
Let’s talk about these steps in detail in the rest of the article.
As mentioned above, a human pose can be fully described by specifying just 17 key essential points (known as landmark points in the deep learning community). You may have guessed, we are estimating the humans’ poses (i.e. tracking a human’s pose across frames of a video) using deep learning. There are quite a few state-of-the-art frameworks (such as PoseFlow and AlphaPose) that can be found online (and by online, I mean on Github) that have already implemented pose estimation to a decent level of accuracy.
PoseFlow on the left. AlphaPose on the right. GIFs Source: https://github.com/MVIG-SJTU/AlphaPose
The code for the AlphaPose framework can be found here.
We’ve used Alpha Pose to detect the poses of humans present in a video. The next step is to track their movements to be able to build a smooth moving animation. The research paper for the DeepSORT framework can be found here.
Using the output of DeepSORT and FaceReid bounding boxes, we segregate the poses of different persons in the following manner.
#deep-learning #unity #computer-vision #human-pose-estimation