My work is an extension of Pankaj Kumar’s work that can be found here. Instead of a feed-forward Neural Net, I used a pre-trained ResNet model(Transfer Learning) to gain better accuracy. (Thanks a lot Pankaj Kumar).

You can download the dataset from here. The dataset consists of 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22050 Hz monophonic 16-bit audio files in .wav format.

The dataset consists of 10 genres i.e

  • Blues
  • Classical
  • Country
  • Disco
  • Hip-hop
  • Jazz
  • Metal
  • Pop
  • Reggae
  • Rock

Let’s start

!mkdir genres && wget http://opihi.cs.uvic.ca/sound/genres.tar.gz && tar -xf genres.tar.gz genres/

This helps to download the data and unpack it into a folder called ‘genres’.

Now, Let’s import the libraries we will be needing:

import torch
import torchvision
import torchaudio
import random
import numpy as np
import librosa
import librosa.display
import pandas as pd
import os
from PIL import Image
import pathlib
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import random_split
%matplotlib inline
from tqdm.autonotebook import tqdm
import IPython.display as ipd
import torchvision.transforms as T

Now, I will give the data directory:

data_path = '/content/genres/'

Now we will convert the music files i.e .wav files to image format using the Librosa library. A detailed explanation can be found here.

cmap = plt.get_cmap('inferno')

plt.figure(figsize=(8,8))
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
for g in genres:
pathlib.Path(f'img_data/{g}').mkdir(parents=True, exist_ok=True)
for filename in os.listdir(f'{data_path}/{g}'):
songname = f'{data_path}/{g}/{filename}'
y, sr = librosa.load(songname, mono=True, duration=5)
plt.specgram(y, NFFT=2048, Fs=2, Fc=0, noverlap=128, cmap=cmap, sides='default', mode='default', scale='dB');
plt.axis('off');
plt.savefig(f'img_data/{g}/{filename[:-3].replace(".", "")}.png')
plt.clf()

Image for post

Code Screenshot

Let’s visualize an image:

import matplotlib.image as mpimg
img=mpimg.imread(img_path+'/blues/blues00093.png')
imgplot = plt.imshow(img)
plt.show()
print('shape of image is:',img.shape)

#zero-to-gan #music-genre #deep-learning #pytorch #deep learning

Music Genre Classification using Transfer Learning(Pytorch)
6.50 GEEK