Following on from the previous lecture, DeepMind Research Scientist Viorica Patraucean introduces classic computer vision tasks beyond image classification (object detection, semantic segmentation, optical flow estimation) and describes state of the art models for each, together with standard benchmarks. She discusses similar models for video processing for tasks like action recognition, tracking, and the associated challenges. In particular, she refers to recent work to make video processing more efficient, including using elements of reinforcement learning. Next, she describes various settings for self-supervised learning in uni-modal and multi-modal (vision+audio, visio+language) settings, where large scale is beneficial. Viorica ends with a discussion on open questions in vision and the role of computer vision research within the broader goal of building intelligent agents.

#deep learning

 Advanced Models for Computer Vision
1.30 GEEK