VGG-SOUND Datasets is Developed by VGG, Department of Engineering Science, University of Oxford, UK Audio VGGSound Dataset has set a benchmark for audio recognition with visuals. It contains more than 210 k videos with visual and audio. The dataset contains over 310 categorie and 550 hours of video. It is available to download for commercial/research purposes. The VGGSound dataset consists of each video and audio segment being 10 seconds long.

Andrew ZissermanAndrea Vedaldi is Principal Researchers of VGG(Visual Geometry Group). The VGG-Sound researcher Honglie Chen, Weidi XieAndrea Vedaldi and Andrew Zisserman are the core members of VGG, Department of Engineering Science, University of Oxford, UK, published on ICASSP, 2020.

#developers corner #audio #dataset #numpy #pandas #pytorch

Guide To VGG-SOUND Datasets For Visual-Audio Recognition
1.85 GEEK