A primer in deep learning for audio classification using tensorflow

Convolutional Neural Nets

CNNs or convolutional neural nets are a type of deep learning algorithm that does really well at learning images.

That’s because they can learn patterns that are translation invariant and have _spatial hierarchies _(F. Chollet, 2018).

Image by Author

That means if If the CNN learns the dog in the left corner of the image above, then it can identify the dog in the other two pictures that have been moved around (translation invariance).

If the CNN learns the dog from the left corner of the image above, it will recognize pieces of the original image in the other two pictures because it has learned what the edges of the her eye with heterochromia looks like, her wolf-like snout and the shape of her stylish headphones (spatial hierarchies).

These properties make CNNs formidable learners for images because the real world doesn’t always look exactly like the training data.

Can I use this for audio?

**Yes. **You can extract features which look like images and shape them in a way in order to feed them into a CNN.

This article explains how to train a CNN to classify species based on audio information.

#tensorflow #audio-classification #machine-learning #deep-learning #cnn

CNNs for Audio Classification
1.45 GEEK