Generating Piano Music with Dilated Convolutional Neural Networks

Generating Piano Music with Dilated Convolutional Neural Networks

How to build fully convolutional neural networks that can model the complex structure of piano music with striking success. Fully convolutional neural networks consisting of dilated 1D convolutions are straightforward to construct, easy to train, and can generate realistic piano music, such as the following:

Fully convolutional neural networks consisting of dilated 1D convolutions are straightforward to construct, easy to train, and can generate realistic piano music, such as the following:

Example performance generated by a fully convolutional network trained on 100 hours of classical music.

Motivation

A considerable amount of research has been devoted to training deep neural networks that can compose piano music. For example, Musenet, developed by OpenAI, has trained large-scale transformer models capable of composing realistic piano pieces that are many minutes in length. The model used by Musenet adopts many of the technologies, such as attention layers, that were originally developed for NLP tasks. See this previous TDS post for more details on applying attention-based models to music generation.

Although NLP-based methods are a fantastic fit for machine-based music generation (after all, music is like a language), the transformer model architecture is somewhat involved, and proper data preparation and training can require great care and experience. This steep learning curve motivates my exploration of simpler approaches to training deep neural networks that can compose piano music. In particular, I’ll focus on fully convolutional neural networks based on dilated convolutions, which require only a handful of lines of code to define, take minimal data preparation, and are easy to train.

Historical Context

In 2016, DeepMind researchers introduced the WaveNet model architecture,¹ which yielded state-of-the-art performance in speech synthesis. Their research demonstrated that stacked 1D convolutional layers with exponentially growing dilation rates can process sequences of raw audio waveforms extremely efficiently, leading to generative models that can synthesize convincing audio from a variety of sources, including piano music.

In this post, I build upon DeepMind’s research, with an explicit focus on generating piano music. Instead of feeding the model raw audio from recorded music, I explicitly feed the model sequences of piano notes encoded in Musical Instrument Digital Interface (MIDI) files. This facilitates data collection, drastically reduces computational load, and allows the model to focus entirely on the musical aspects of the data. This efficient data encoding and ease of data collection enables rapid exploration of how well fully-convolutional networks can understand piano music.

How Well Can These Models ‘Play the Piano’?

To give a sense of how realistic these models can sound, let’s play an imitation game. Which excerpt below is composed by a human, and which is composed by a model

wavenet machine-learning piano deep-learning tensorflow

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Hire Machine Learning Developers in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Applications of machine learning in different industry domains

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Hire Machine Learning Developer | Hire ML Experts in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Integrating Tensorflow and Qiskit for Quantum Machine Learning

Integrating Tensorflow and Qiskit for Quantum Machine Learning: Taking a step towards quantum machine learning. In this article, we will be talking about integrating Qiskit in custom Keras layers.

Handling Imbalanced Dataset in Machine Learning | Deep Learning Tutorial (TensorFlow 2.0 & Python)

In this video I am discussing various techniques to handle imbalanced dataset in machine learning. I also have a python code that demonstrates these different techniques. In the end there is an exercise for you to solve along with a solution link. Credit card fraud detection, cancer prediction, customer churn prediction are some of the examples where you might get an imbalanced dataset. Training a model on imbalanced dataset requires making certain adjustments otherwise the model will not perform as per your expectations.