Elkamel Hassen

Elkamel Hassen

1597113906

Music Genre Classification With TensorFlow

The rise of music streaming services has made music ubiquitous. We listen to music during our commute, while we exercise, work or simply to relax. The ongoing disturbance in our daily lives has not dampened the role of music to elicit emotion and process our thoughts, as exemplified in the emergence of “Zoom concerts”.

One key feature of these services is the playlists, often grouped by genre. This data could come from manual labeling by the people publishing the songs. But this does not scale well and might be gamed by artists who want to capitalize on the popularity of a specific genre. A better option is to rely on automated music genre classification. With my two collaborators Wilson Cheung and Joy Gu, we sought to compare different methods of classifying music samples into genres. In particular, we evaluated the performance of standard machine learning vs. deep learning approaches. What we found is that feature engineering is crucial, and that domain knowledge can really boost performance.

After describing the data source used, I give a brief overview of the methods we used and their results. In the last part of this article, I will spend more time explaining the way the TensorFlow framework in Google Colab can perform these tasks efficiently with GPU or TPU runtimes thanks to the TFRecord format. All the code is available here, and we are happy to share our more detailed report with anyone interested.

Data Source

Predicting the genre of an audio sample is a supervised learning problem (for a good primer on supervised vs. unsupervised learning, I recommend Devin’s article on the topic). In other words, we needed data that contains labeled examples. The FreeMusicArchive is a repository of audio segments with relevant labels and metadata, which was originally collected for a paper at the International Society for Music Information Retrieval Conference (ISMIR) in 2017.

We focused our analysis on the small subset of the data provided. It contains 8,000 audio segments, each 30 seconds in length and classified as one of eight distinct genres:

  • Hip-Hop
  • Pop
  • Folk
  • Experimental
  • Rock
  • International
  • Electronic
  • Instrumental

Each genre comes with 1,000 representative audio segments. With the sample rate of 44,100 Hz, this means there are more than 1 million data points for each audio sample, or more than 10⁹ data points total. Using all of this data in a classifier is a challenge, which we will discuss more in upcoming sections.

For instructions on how to download the data, please refer to the README included in the repository. We were very grateful to Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson for putting this data together and making it freely available, but can only imagine the insights that become available at the scale of the data owned by Spotify or Pandora Radio. With this data, we can describe various models to perform the task at hand.

Model Description

I will keep the theoretical details to a minimum, but will link to relevant resources whenever possible. In addition, our report contains a lot more information than what I can include here, in particular around feature engineering, so do let me know in the comments if you would like me to share it with you.

Standard Machine Learning

We used Logistic regression, k-nearest neighbors (kNN), Gaussian Naive Bayes and Support Vector machines (SVM):

  • SVM tries to find the best decision boundary by maximizing the margin with the training data. The kernel trick defines non-linear boundaries by projecting data to a high-dimensional space
  • kNN assigns a label based on a majority vote of the k closest training samples
  • Naive Bayes predicts the probability of different classes based on features. The conditional independence assumption greatly simplifies calculations
  • Logistic regression also predicts the probability of different classes by modeling the probability directly, leveraging the logistic function

Deep Learning

For deep learning, we leverage the TensorFlow framework (see more details in the second part of this article). We built different models based on the type of inputs.

With raw audio, each example is an audio sample of 30s, or approximately 1.3 million data points. These floating point values (positive or negative) represent the wave displacement at a certain point in time. In order to manage computational resources, less than 1% of the data can be used. With these features and the associated label (one-hot encoded), we can build a convolutional neural network. The general architecture is as follows:

#deep-learning #machine-learning #tensorflow #developer

What is GEEK

Buddha Community

Music Genre Classification With TensorFlow
Vern  Greenholt

Vern Greenholt

1594954920

Breaking Spotify’s Algorithm of Music Genre Classification!

Introduction

There are many different types of genres present in the industry. But the basic genres will have a few principle aspects that make it easier to identify them. Genres are used to tag and define different kinds of music based on the way they are composed or based on their musical form and musical style.

In this article, you will learn to build your own model which will take in a song as an input and predict or classify that particular song in one of the genres. We will be classifying among the following basic genres — blues. classical, country, disco, hip hop, jazz, metal, pop, reggae and rock. The model will be build using LSTM networks. Don’t worry if you do not know what LSTM is. This article will give you a brief understanding of LSTM and its working.

Here is the GitHub link to the entire project — https://github.com/rajatkeshri/Music-Genre-Prediction-Using-RNN-LSTM

The entire article is divided into 4segments —

  1. Prerequisites
  2. Theory
  3. Data Preprocessing
  4. Training the model
  5. Predicting on new data

Prerequisites

There are a few prerequisites you will need to have before you start this project. The first thing you would require is the dataset. The music data which I have used for this project can be downloaded from kaggle — https://www.kaggle.com/andradaolteanu/gtzan-dataset-music-genre-classification.

Note that this dataset contains 10 classes with 100 songs withing each class. This might sound to be very less for a machine learning project, that is why in the next section I will show you how to increase the number of training data for each class of genre.

There are a few modules which will be required for you to install in your PC/laptop in order to get started. We will be building the entire LSTM model using Tensorflow, coded in python. We will be working with python 3.6 or higher (If you are using python 2.7, it is required for you to use python 3.6 or higher for full support and functionality). The following are the required python packages to be installed —

  1. Tensorflow — Machine learning library
  2. librosa — Speech processing library to extract features from songs
  3. numpy — Mathematical model for scientific computing
  4. sklrean — Another machine learning model (We will use this library to split training and testing data)
  5. json — To jsonify the dataset (Explained in the next section)
  6. pytdub — To convert mp3 to wav files

These modules can be installed using pip or conda. You can find many online sources and youtube videos on getting started with pip or conda. Once the above modules are installed, let’s get coding!

#spotify #mfcc #music-genre #classification #lstm #algorithms

Queenie  Davis

Queenie Davis

1624525542

MusicBERT: Microsoft’s Large Scale Pre-Trained Model For Symbolic Music Understanding

Microsoft recently developed a large scale pre-trained model for symbolic music understanding called MusicBERT. Symbolic music understanding refers to understanding music from the symbolic data (for example, MIDI format). It covers many music applications such as emotion classification, genre classification, and music pieces matching.

For developing MusicBERT, Microsoft has used OctupleMIDI method, bar-level masking strategy, and a large scale symbolic music corpus of more than 1 million music tracks.

Why OctupleMIDI?

OctupleMIDI is a novel music encoding method that encodes each note into a tuple with eight elements, representing the different aspects of the characteristics of a musical note, including instrument, tempo, bar, position, time signature, pitch, duration, and velocity.

Here are some of the advantages of OctupleMIDI:

  • Reduces the length of a music sequence (4x shorter than REMI), thus easing the modelling of music sequences by Transformer considering that music sequences themselves are very long
  • It is ‘note’ centric. Since each note contains the same eight tuple structure and covers adequate information to express various music genres, like time signature, long note duration, etc., OctupleMIDI is much easier.
  • It is universal compared to previous encoding methods since each note contains the 8-tuple structure to express different music genres.

MusicBERT architecture

The authors of the study established that it was challenging to apply NLP directly to symbolic music because it differs greatly from natural text data. There are following challenges:

  • Music songs are more structural and diverse, making it more difficult to encode as compared to natural language.
  • Due to complicated encoding of symbolic music, there are higher chances of information leakage in pre-training
  • The pre-training for music understanding is limited due to lack of large-scale symbolic music corpora

To remediate this, researchers Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu have developed MusicBERT, a large-scale pre-trained model with music encoding and masking strategy for music understanding. This model evaluates symbolic music understanding tasks, including melody completion, accompaniment suggestion, style classification and genre classification.

Besides OctupleMIDI, MusicBERT uses a bar-level masking strategy. The masking strategy in original BERT for NLP tasks randomly masks some tokens, causing information leakage in music pre-training. However, in the bar-level masking strategy used in MusicBERT, all the tokens of the same type (for example, time signature, instruments, pitch, etc.) are masked in a bar to avoid information leakage and for representational learning.

In addition to this, MusicBERT also uses a large-scale and diverse symbolic music dataset, called the million MIDI dataset (MMD). It contains more than 1 million music songs, with different genres, including Rock, Classical, Rap, Electronic, Jazz, etc. It is one of the most extensive datasets in current literature — ten times larger than the previous largest dataset LMD in terms of the number of songs (148,403 songs and 535 million notes). MMD has about 1,524,557 songs and two billion notes. This dataset benefits representation learning for music understanding significantly.

#opinions #bert music #build music software #genre classification #machine learning and music #microsoft latest

Chaz  Homenick

Chaz Homenick

1594782660

Audio Classification in an Android App with TensorFlow Lite

Deploying machine learning-based Android apps is gaining prominence and momentum with frameworks like TensorFlow Lite, and there are quite a few articles that describe how to develop mobile apps for computer vision tasks like text classification and image classification.

But there’s very much less that exists about working with audio-based ML tasks in mobile apps, and this blog is meant to address that gap — specifically, I’ll describe the steps and code required to perform audio classification in Android apps.

Image for post
Tensorflow Lite Model on Android to make audio classification

Intended Audience and Pre-requisites:

This article covers different technologies required to develop ML apps in mobile and deals with audio processing techniques. As such, the following are the pre-requisite to get the complete understanding of the article:

→ Familiarity with deep learning, Keras, and convolutional neural networks

→ Experience with Python and Jupyter Notebooks

→ Basic understanding of audio processing and vocal classification concepts

→ Basics of Android app development with Kotlin

Note: If you’re new to audio processing concepts and would like to understand what MFCC [‘Mel Frequency Cepstral Coefficient’] is — pls refer this other blog of mine, where I have explained some of these concepts in detail.

I’ve provided detailed info with regard to various steps and processing involved, and have commented on the code extensively in GitHub for easier understanding. Still, if you have any queries, please feel free to post them as comments.

A Major Challenge

One major challenge with regard to development of audio-based ML apps in Android is the lack of libraries in Java that perform audio processing.

I was surprised to find that there are no libraries available in Java for Android that help with the calculation of MFCC and other features required for audio classification. Most of my time with regard to this article has been spent towards developing a Java components that generates MFCC values just like Librosa does — which is very critical to a model’s ability to make predictions.

What We’ll Build

At the end of the tutorial, you’ll have developed an Android app that helps you classify audio files present in your mobile sdcard directory into any one of the noise type of the Urbancode Challenge dataset. Your app should more or less look like below:

#tensorflow #heartbeat #tensorflow-lite #audio-classification #android #android app

Elkamel Hassen

Elkamel Hassen

1597113906

Music Genre Classification With TensorFlow

The rise of music streaming services has made music ubiquitous. We listen to music during our commute, while we exercise, work or simply to relax. The ongoing disturbance in our daily lives has not dampened the role of music to elicit emotion and process our thoughts, as exemplified in the emergence of “Zoom concerts”.

One key feature of these services is the playlists, often grouped by genre. This data could come from manual labeling by the people publishing the songs. But this does not scale well and might be gamed by artists who want to capitalize on the popularity of a specific genre. A better option is to rely on automated music genre classification. With my two collaborators Wilson Cheung and Joy Gu, we sought to compare different methods of classifying music samples into genres. In particular, we evaluated the performance of standard machine learning vs. deep learning approaches. What we found is that feature engineering is crucial, and that domain knowledge can really boost performance.

After describing the data source used, I give a brief overview of the methods we used and their results. In the last part of this article, I will spend more time explaining the way the TensorFlow framework in Google Colab can perform these tasks efficiently with GPU or TPU runtimes thanks to the TFRecord format. All the code is available here, and we are happy to share our more detailed report with anyone interested.

Data Source

Predicting the genre of an audio sample is a supervised learning problem (for a good primer on supervised vs. unsupervised learning, I recommend Devin’s article on the topic). In other words, we needed data that contains labeled examples. The FreeMusicArchive is a repository of audio segments with relevant labels and metadata, which was originally collected for a paper at the International Society for Music Information Retrieval Conference (ISMIR) in 2017.

We focused our analysis on the small subset of the data provided. It contains 8,000 audio segments, each 30 seconds in length and classified as one of eight distinct genres:

  • Hip-Hop
  • Pop
  • Folk
  • Experimental
  • Rock
  • International
  • Electronic
  • Instrumental

Each genre comes with 1,000 representative audio segments. With the sample rate of 44,100 Hz, this means there are more than 1 million data points for each audio sample, or more than 10⁹ data points total. Using all of this data in a classifier is a challenge, which we will discuss more in upcoming sections.

For instructions on how to download the data, please refer to the README included in the repository. We were very grateful to Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson for putting this data together and making it freely available, but can only imagine the insights that become available at the scale of the data owned by Spotify or Pandora Radio. With this data, we can describe various models to perform the task at hand.

Model Description

I will keep the theoretical details to a minimum, but will link to relevant resources whenever possible. In addition, our report contains a lot more information than what I can include here, in particular around feature engineering, so do let me know in the comments if you would like me to share it with you.

Standard Machine Learning

We used Logistic regression, k-nearest neighbors (kNN), Gaussian Naive Bayes and Support Vector machines (SVM):

  • SVM tries to find the best decision boundary by maximizing the margin with the training data. The kernel trick defines non-linear boundaries by projecting data to a high-dimensional space
  • kNN assigns a label based on a majority vote of the k closest training samples
  • Naive Bayes predicts the probability of different classes based on features. The conditional independence assumption greatly simplifies calculations
  • Logistic regression also predicts the probability of different classes by modeling the probability directly, leveraging the logistic function

Deep Learning

For deep learning, we leverage the TensorFlow framework (see more details in the second part of this article). We built different models based on the type of inputs.

With raw audio, each example is an audio sample of 30s, or approximately 1.3 million data points. These floating point values (positive or negative) represent the wave displacement at a certain point in time. In order to manage computational resources, less than 1% of the data can be used. With these features and the associated label (one-hot encoded), we can build a convolutional neural network. The general architecture is as follows:

#deep-learning #machine-learning #tensorflow #developer

5 Steps to Passing the TensorFlow Developer Certificate

Deep Learning is one of the most in demand skills on the market and TensorFlow is the most popular DL Framework. One of the best ways in my opinion to show that you are comfortable with DL fundaments is taking this TensorFlow Developer Certificate. I completed mine last week and now I am giving tips to those who want to validate your DL skills and I hope you love Memes!

  1. Do the DeepLearning.AI TensorFlow Developer Professional Certificate Course on Coursera Laurence Moroney and by Andrew Ng.

2. Do the course questions in parallel in PyCharm.

#tensorflow #steps to passing the tensorflow developer certificate #tensorflow developer certificate #certificate #5 steps to passing the tensorflow developer certificate #passing