Urban Sound Classification with Librosa — tricky cross-validation

Outline

The goal of this post is two-fold:

I’ll show an example of implementing the results of an interesting research paper on classifying audio clips based on their sonic content. This will include applications of the librosa library, which is a Python package for music and audio analysis. The clips are short audio clips from city, and the classification task is predicting the appropriate category label.
I’ll show the importance of a valid cross-validation scheme. Given the nuances of the audio source dataset I’ll be using, it is very easy to accidentally leak information from the recording that will overfit your model and prevent it from generalizing. The solution is somewhat subtle so it seemed like a nice opportunity for a blog post.

Original research paper

http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_urbansound_acmmm14.pdf

Source dataset, by paper authors

https://urbansounddataset.weebly.com/urbansound8k.html

Summary of their dataset

“This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy.”

I’ll extract features from these sound excerpts and fit a classifier to predict one of the 10 classes. Let’s get started!

Note on my Code

I’ve created a repo that allows you to re-create my example in full:

Script runner: https://github.com/marcmuon/urban_sound_classification/blob/master/main.py
Feature extraction module: https://github.com/marcmuon/urban_sound_classification/blob/master/audio.py
Model module: https://github.com/marcmuon/urban_sound_classification/blob/master/model.py

#data-science #python #machine-learning #statistics