Microsoft recently developed a large scale pre-trained model for symbolic music understanding called MusicBERT. Symbolic music understanding refers to understanding music from the symbolic data (for example, MIDI format). It covers many music applications such as emotion classification, genre classification, and music pieces matching.
For developing MusicBERT, Microsoft has used OctupleMIDI method, bar-level masking strategy, and a large scale symbolic music corpus of more than 1 million music tracks.
OctupleMIDI is a novel music encoding method that encodes each note into a tuple with eight elements, representing the different aspects of the characteristics of a musical note, including instrument, tempo, bar, position, time signature, pitch, duration, and velocity.
Here are some of the advantages of OctupleMIDI:
The authors of the study established that it was challenging to apply NLP directly to symbolic music because it differs greatly from natural text data. There are following challenges:
To remediate this, researchers Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu have developed MusicBERT, a large-scale pre-trained model with music encoding and masking strategy for music understanding. This model evaluates symbolic music understanding tasks, including melody completion, accompaniment suggestion, style classification and genre classification.
Besides OctupleMIDI, MusicBERT uses a bar-level masking strategy. The masking strategy in original BERT for NLP tasks randomly masks some tokens, causing information leakage in music pre-training. However, in the bar-level masking strategy used in MusicBERT, all the tokens of the same type (for example, time signature, instruments, pitch, etc.) are masked in a bar to avoid information leakage and for representational learning.
In addition to this, MusicBERT also uses a large-scale and diverse symbolic music dataset, called the million MIDI dataset (MMD). It contains more than 1 million music songs, with different genres, including Rock, Classical, Rap, Electronic, Jazz, etc. It is one of the most extensive datasets in current literature — ten times larger than the previous largest dataset LMD in terms of the number of songs (148,403 songs and 535 million notes). MMD has about 1,524,557 songs and two billion notes. This dataset benefits representation learning for music understanding significantly.
#opinions #bert music #build music software #genre classification #machine learning and music #microsoft latest