Recently I got my very first paper accepted to the International Conference on Technologies for Music Notation and Representation (TENOR) 2020. The journey of getting published was very insightful and it will serve as my own guide to publishing in the future.

Image for post

The paper summarises prior work and takes a position in progressing the field of my research topic — Optical Music Recognition (OMR). You can read more about OMR in my previous article. I have heard the pros and cons of publishing a position paper at the beginning of my academic journey. However, writing this paper made me doubt myself which always resulted in learning more.

Back to the actual content of the paper, I try to summarize the four main stages of the OMR pipeline variety of published work in each stage. Furthermore, I try to capture the paradigm shift in the methods used in OMR from conventional computer vision systems to end-to-end deep learning networks.

Image for post

Overall OMR traditional pipeline [13]

Initially, the four stages of OMR included image preprocessing, musical object detection, musical symbol reconstruction and finally encoding the musical knowledge into a machine-readable file. In the image preprocessing stage mainly enhancement, de-skewing, blurring, noise removal and binarisation were applied [1, 2, 3, 4, 5]. Binarisation is the process of converting an image to binary (only black and white pixels). Initially, such processes were performed using traditional techniques such as choosing a binarisation threshold based on the global histogram of the image. Later on, for instance, binarisation is done using sectional auto-encoders [6, 7]. These encoders learn an end-to-end transformation for the binarisation.

Moving on to musical symbol detection, this stage has three substages: staff-processing, musical symbol processing and finally classification. In staff-processing, staff lines are first detected and depending on the study removed. Lately, Pacha et al. using object detection techniques proved that removing staff lines does not guarantee better performance [8].

The musical object detection stage has largely benefited from the state of the art in computer vision, especially from object detection in general. Models such as Fast R-CNNs, Faster R-CNNs, Single Shot Detectors (SSD) were used to detect musical objects. They use pre-trained models which are later fine-tuned in a handwritten sheet music dataset MUSCIMA++ [9]. This work draws a baseline on using deep learning in object detection in sheet music.

#machine-learning #music #artificial-intelligence #deep learning

Optical Music Recognition: State of the Art and Major Challenges
1.45 GEEK