Calculating spectrogram of .wav files in python

Calculating spectrogram of .wav files in python

I am trying to calculate the spectrogram out of&nbsp;<code>.wav</code>&nbsp;files using Python. In an effort to do so, I am following the instructions that could be found&nbsp;<a href="" target="_blank">in here</a>. I am firstly read&nbsp;<code>.wav</code>&nbsp;files using librosa library. The code found in the link works properly. That code is:

I am trying to calculate the spectrogram out of .wav files using Python. In an effort to do so, I am following the instructions that could be found in here. I am firstly read .wav files using librosa library. The code found in the link works properly. That code is:

sig, rate = librosa.load(file, sr = None)
sig = buf_to_int(sig, n_bytes=2)
spectrogram = sig2spec(rate, sig)

And the function sig2spec:

def sig2spec(signal, sample_rate):

Read the file.

sample_rate, signal =

signal = signal[0:int(1.5 * sample_rate)] # Keep the first 3.5 seconds


Pre-emphasis step: Amplification of the high frequencies (HF)

(1) balance the frequency spectrum since HF usually have smaller magnitudes compared to LF

(2) avoid numerical problems during the Fourier transform operation and

(3) may also improve the Signal-to-Noise Ratio (SNR).

pre_emphasis = 0.97 emphasized_signal = numpy.append(signal[0], signal[1:] - pre_emphasis * signal[:-1])


Consequently, we split the signal into short time windows. We can safely make the assumption that

an audio signal is stationary over a small short period of time. Those windows size are balanced from the

parameter called frame_size, while the overlap between consecutive windows is controlled from the

variable frame_stride.

frame_size = 0.025 frame_stride = 0.01 frame_length, frame_step = frame_size * sample_rate, frame_stride * sample_rate # Convert from seconds to samples signal_length = len(emphasized_signal) frame_length = int(round(frame_length)) frame_step = int(round(frame_step)) num_frames = int(numpy.ceil(float(numpy.abs(signal_length - frame_length)) / frame_step))

Make sure that we have at least 1 frame

pad_signal_length = num_frames * frame_step + frame_length z = numpy.zeros((pad_signal_length - signal_length)) pad_signal = numpy.append(emphasized_signal, z)

Pad Signal to make sure that all frames have equal

number of samples without truncating any samples from the original signal

indices = numpy.tile(numpy.arange(0, frame_length), (num_frames, 1))
+ numpy.tile(numpy.arange(0, num_frames * frame_step, frame_step), (frame_length, 1)).T

frames = pad_signal[indices.astype(numpy.int32, copy=False)]

Apply hamming windows. The rationale behind that is the assumption made by the FFT that the data

is infinite and to reduce spectral leakage.

frames *= numpy.hamming(frame_length)

Fourier-Transform and Power Spectrum

nfft = 2048 mag_frames = numpy.absolute(numpy.fft.rfft(frames, nfft)) # Magnitude of the FFT pow_frames = ((1.0 / nfft) * (mag_frames ** 2)) # Power Spectrum

Transform the FFT to MEL scale

nfilt = 40 low_freq_mel = 0 high_freq_mel = (2595 * numpy.log10(1 + (sample_rate / 2) / 700)) # Convert Hz to Mel mel_points = numpy.linspace(low_freq_mel, high_freq_mel, nfilt + 2) # Equally spaced in Mel scale hz_points = (700 * (10 ** (mel_points / 2595) - 1)) # Convert Mel to Hz bin = numpy.floor((nfft + 1) * hz_points / sample_rate)

fbank = numpy.zeros((nfilt, int(numpy.floor(nfft / 2 + 1)))) for m in range(1, nfilt + 1): f_m_minus = int(bin[m - 1]) # left f_m = int(bin[m]) # center f_m_plus = int(bin[m + 1]) # right

for k in range(f_m_minus, f_m):
    fbank[m - 1, k] = (k - bin[m - 1]) / (bin[m] - bin[m - 1])
for k in range(f_m, f_m_plus):
    fbank[m - 1, k] = (bin[m + 1] - k) / (bin[m + 1] - bin[m])

filter_banks =, fbank.T) filter_banks = numpy.where(filter_banks == 0, numpy.finfo(float).eps, filter_banks) # Numerical Stability filter_banks = 20 * numpy.log10(filter_banks) # dB

return (filter_banks/ np.amax(filter_banks))*255

I can produce images that look like:

However, in some cases my spectrogram looks like:

Something really weird is happening since at the beginning of the signal there are some blue stripes in the images that I do not understand if they really mean something or there is an error when calculating the spectrogram. I guess the issue is related to normalization, but I am not sure what is exactly.

EDIT: I tried to use the recommended librosa from the library:

sig, rate = librosa.load("audio.wav", sr = None)
spectrogram = librosa.feature.melspectrogram(y=sig, sr=rate)
spec_shape = spectrogram.shape
fig = plt.figure(figsize=(spec_shape), dpi=5)
lidis.specshow(spectrogram.T, cmap=cm.jet)

The spec now is almost everywhere dark blue:

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

Brave, Chrome, Firefox, Opera or Edge: Which is Better and Faster?

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

Top Python Development Companies | Hire Python Developers

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When...

Python NumPy Tutorial for Beginners

Learn the basics of the NumPy library in this tutorial for beginners. It provides background information on how NumPy works and how it compares to Python's Built-in lists. This video goes through how to write code with NumPy. It starts with the basics of creating arrays and then gets into more advanced stuff. The video covers creating arrays, indexing, math, statistics, reshaping, and more.

Python NumPy Tutorial for Beginners

NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed.