Deep learning approaches have empirically demonstrated remarkable success in learning image representations for tasks like object recognition, image captioning, and semantic segmentation. Convolutional neural networks have enabled us to efficiently capture the hypothesis of spatial locality of data structure in images through parameter sharing convolutions, and local invariance-building max-pooling neurons. In this literature review, I would like to explore the impact of deep learning techniques on video tasks, specifically action recognition.

I would like to explore how spatiotemporal features are aggregated through various deep architectures, the role of optical flow as an input, the impacts on real-time capabilities, and the compactness & interpretability of the learned features.

I will then propose areas of future research that I believe could help bias our deep learning architectures in a way that better captures temporal hypotheses of the real-world; the natural manifold.

#deep-learning #data-science #machine-learning #action-recognition #shivam-sharma

Deep Learning Architectures for Action Recognition
2.90 GEEK