At AI Music, where our back catalogue of content grows every day, it is becoming increasingly necessary for us to create more intelligent systems for searching and querying the music.
At AI Music, where our back catalogue of content grows every day, it is becoming increasingly necessary for us to create more intelligent systems for searching and querying the music. One such system for doing that can be dictated by the ability to define and quantify the degree of similarity between songs. The core methodology described here tackles the concept of acoustic similarity.
Searching for a song using descriptive tags often introduces the issue of semantic inconsistencies. Tags can be highly subjective by age group, culture, and personal preference of a listener. For example, descriptors such as ‘bright’ or ‘cold’ could mean entirely different things to different people. Music can also sit in blurry areas when it comes to genre. A song such as Sabotage by the Beastie Boys is primarily known as a Hip-Hop/Rap song, yet it contains a lot of the sonic qualities we would traditionally attribute to a Rock song. The ability to use an example reference track to retrieve a similar song or ranked list of similar songs from a large catalogue avoids such issues.
Nevertheless, when we perceive two or more songs to be similar to one another what does this actually mean? This perceived similarity is often very difficult to define as it comprises a number of different aspects, such as genre, instrumentation, mood, tempo and many more. To complicate the problem further, similarity tends to be made of an unrestricted combination of such characteristics. With song similarity being such a subjective concept, how are we tackling the issue of defining a ground truth?
Traditional methods for determining the similarity between songs require you to select and extract music features from the audio. How close or far these features are to one another within a space is then presumed to be the perceptual similarity of the respective tracks. One problem when employing this approach is how to determine which features best map to the perceived similarity. At AI Music, we tackle this problem by employing an approach based on Siamese Neural Networks (SNN).
The SNN architecture is based on a Convolutional Neural Network architecture, which means we needed to transform the audio into an image. The most common image representation of audio is a waveform where the signal amplitude is plotted against time. For our application we use a visual representation of the audio known as a spectrogram, specifically a mel spectrogram.
We have chosen mel spectrograms as they have been found to be good representations for the timbre of a sound and are therefore better representations of the acoustic characteristics of a song.
Figure 1: Comparison of waveform, spectrogram and mel spectrogram
As we can see from the above image, relevant musical information is revealed more clearly in the mel spectrogram.
deep-learning machine-learning audio artificial-intelligence data-science
Artificial Intelligence (AI) vs Machine Learning vs Deep Learning vs Data Science: Artificial intelligence is a field where set of techniques are used to make computers as smart as humans. Machine learning is a sub domain of artificial intelligence where set of statistical and neural network based algorithms are used for training a computer in doing a smart task. Deep learning is all about neural networks. Deep learning is considered to be a sub field of machine learning. Pytorch and Tensorflow are two popular frameworks that can be used in doing deep learning.
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
Simple explanations of Artificial Intelligence, Machine Learning, and Deep Learning and how they’re all different
Artificial Intelligence (AI) will and is currently taking over an important role in our lives — not necessarily through intelligent robots.
Data Augmentation is a technique in Deep Learning which helps in adding value to our base dataset by adding the gathered information from various sources to improve the quality of data of an organisation.