Build a Deep Audio De-Noiser Using TensorFlow 2.0

Build a Deep Audio De-Noiser Using TensorFlow 2.0

Build a Deep Audio De-Noiser Using TensorFlow 2.0 .Speech denoising is a long-standing problem. Given a noisy input signal, the aim is to filter out such noise without degrading the signal of interest.

Introduction

Speech denoising is a long-standing problem. Given a noisy input signal, the aim is to filter out such noise without degrading the signal of interest. You can imagine someone talking in a video conference while a piece of music is playing in the background. In this situation, a speech denoising system has the job of removing the background noise in order to improve the speech signal. Besides many other use cases, this application is especially important for video and audio conferences, where noise can significantly decrease speech intelligibility.

Classic solutions for speech denoising usually employ generative modeling. Here, statistical methods like Gaussian Mixtures estimate the noise of interest and then recover the noise-removed signal. However, recent development has shown that in situations where data is available, deep learning often outperforms these solutions.

In this article, we tackle the problem of speech denoising using Convolutional Neural Networks (CNNs). Given a noisy input signal, we aim to build a statistical model that can extract the clean signal (the source) and return it to the user. Here, we focus on source separation of regular speech signals from ten different types of noise often found in an urban street environment.

Datasets

For the problem of speech denoising, we used two popular publicly available audio datasets.

As Mozilla puts it on the MCV website:

Common Voice is Mozilla’s initiative to help teach machines how real people speak.

The dataset contains as many as 2,454 recorded hours, spread in short MP3 files. The project is open source and anyone can collaborate on it. Here, we used the English portion of the data, which contains 30GB of 780 validated hours of speech. One very good characteristic of this dataset is the vast variability of speakers. It contains recordings of men and women from a large variety of ages and accents.

The UrbanSound8K dataset also contains small snippets (<=4s) of sounds. However, there are 8732 labeled examples of ten different commonly found urban sounds. The complete list includes:

  • 0 = air_conditioner
  • 1 = car_horn
  • 2 = children_playing
  • 3 = dog_bark
  • 4 = drilling
  • 5 = engine_idling
  • 6 = gun_shot
  • 7 = jackhammer
  • 8 = siren
  • 9 = street_music

As you might be imagining at this point, we’re going to use the urban sounds as noise signals to the speech examples. In other words, we first take a small speech signal — this can be someone speaking a random sentence from the MCV dataset.

Then, we add noise to it — such as a woman speaking and a dog barking on the background. Finally, we use this artificially noisy signal as the input to our deep learning model. The Neural Net, in turn, receives this noisy signal and tries to output a clean representation of it.

The image below displays a visual representation of a clean input signal from the MCV (top), a noise signal from the UrbanSound dataset (middle), and the resulting noisy input (bottom) — the input speech after adding the noise signal. Also, note that the noise power is set so that the signal-to-noise ratio (SNR) is zero dB (decibel). A ratio higher than 1:1 (greater than 0 dB) indicates more signal than noise.

Data Preprocessing

Most of the benefits of current deep learning technology rest in the fact that hand-crafted features ceased to be an essential step to build a state-of-the-art model. Take feature extractors like SIFT and SURF as an example, which are often used in Computer Vision problems like panorama stitching. These methods extract features from local parts of an image to construct an internal representation of the image itself. However, to achieve the necessary goal of generalization, a vast amount of work is necessary to create features that were robust enough to apply to real-world scenarios. Put differently, these features needed to be invariant to common transformations that we often see day-to-day. Those might include variations in rotation, translation, scaling, and so on. One of the cool things about current deep learning is that most of these properties are learned either from the data and/or from special operations, like the convolution.

For audio processing, we also hope that the Neural Network will extract relevant features from the data. However, before feeding the raw signal to the network, we need to get it into the right format.

First, we downsampled the audio signals (from both datasets) to 8kHz and removed the silent frames from it. The goal is to reduce the amount of computation and dataset size.

It is important to note that audio data differs from images. Since one of our assumptions is to use CNNs (originally designed for Computer Vision) for audio denoising, it is important to be aware of such subtle differences. Audio data, in its raw form, is a one-dimensional time-series data. Images, on the other hand, are two-dimensional representations of an instant moment in time. For these reasons, audio signals are often transformed into (time/frequency) 2D representations.

The Mel-frequency Cepstral Coefficients (MFCCs) and the constant-Q spectrum are two popular representations often used on audio applications. For deep learning, classic MFCCs may be avoided because they remove a lot of information and do not preserve spatial relations. However, for source separation tasks, computation is often done in the time-frequency domain. Audio signals are, in their majority, non-stationary. In other words, the signal’s mean and variance are not constant over time. Thus, there is not much sense in computing a Fourier Transform over the entire audio signal. For this reason, we feed the DL system with spectral magnitude vectors computed using a 256-point Short Time Fourier Transform (STFT). You can see common representations of audio signals below.

To calculate the STFT of a signal, we need to define a window of length M and a hop size value R. The latter defines how the window moves over the signal. Then, we slide the window over the signal and calculate the discrete Fourier Transform (DFT) of the data within the window. Thus, the STFT is simply the application of the Fourier Transform over different portions of the data. Lastly, we extract the magnitude vectors from the 256-point STFT vectors and take the first 129-point by removing the symmetric half. All this process was done using the Python Librosa library. The image below, from MATLAB, illustrates the process.


Credits: MATLAB STFT docs

Here, we defined the STFT window as a periodic Hamming Window with length 256 and hop size of 64. This ensures a 75% overlap between the STFT vectors. In the end, we concatenate eight consecutive noisy STFT vectors and use them as inputs. Thus, an input vector has a shape of (129,8) and is composed of the current STFT noisy vector plus seven previous noisy STFT vectors. In other words, the model is an autoregressive system that predicts the current signal based on past observations. Therefore, the targets consist of a single STFT frequency representation of shape (129,1) from the clean audio. The image below depicts the feature vector creation.

Deep Learning Architecture

Our Deep Convolutional Neural Network (DCNN) is largely based on the work done by A Fully Convolutional Neural Network for Speech Enhancement. Here, the authors propose the Cascaded Redundant Convolutional Encoder-Decoder Network (CR-CED).

The model is based on symmetric encoder-decoder architectures. Both components contain repeated blocks of Convolution, ReLU, and Batch Normalization. In total, the network contains 16 of such blocks — which adds up to 33K parameters.

Also, there are skip connections between some of the encoder and decoder blocks. Here the feature vectors from both components are combined through addition. Very much like ResNets, the skip connections speed up convergence and reduces the vanishing of gradients.

Another important characteristic of the CR-CED network is that convolution is only done in one dimension. More specifically, given an input spectrum of shape (129 x 8), convolution is only performed in the frequency axis (i.e the first one). This ensures that the frequency axis remains constant during forwarding propagation.

The combination of a small number of training parameters and model architecture, makes this model super lightweight, with fast execution, especially on mobile or edge devices.

Once the network produces an output estimate, we optimize (minimize) the mean squared difference (MSE) between the output and the target (clean audio) signals.

Results and Discussion

Let’s check some of the results achieved by the CNN denoiser.

To begin, listen to test examples from the MCV and UrbanSound datasets. They are the clean speech and noise signal, respectively. To recap, the clean signal is used as the target, while the noise audio is used as the source of the noise.

If you are having trouble listening to the samples, you can access the raw files here.

Now, take a look at the noisy signal passed as input to the model and the respective denoised result.

Below, you can compare the denoised CNN estimation (bottom) with the target (clean signal on the top) and noisy signal (used as input in the middle).

As you can see, given the difficulty of the task, the results are somewhat acceptable, but not perfect. Indeed, in most of the examples, the model manages to smooth the noise but it doesn’t get rid of it completely. Take a look at a different example, this time with a dog barking in the background.

One of the reasons this prevents better estimates is the loss function. The Mean Squared Error (MSE) cost optimizes the average over the training examples. We can think of it as finding the mean model that smooths the input noisy audio to provide an estimate of the clean signal. Therefore, one of the solutions is to devise more specific loss functions to the task of source separation.

A particularly interesting possibility is to learn the loss function itself using GANs (Generative Adversarial Networks). Indeed, the problem of audio denoising can be framed as a signal-to-signal translation problem. Very much like image-to-image translation, first, a Generator network receives a noisy signal and outputs an estimate of the clean signal. Then, the Discriminator net receives the noisy input as well as the generator predictor or the real target signals. This way, the GAN will be able to learn the appropriate loss function to map input noisy signals to their respective clean counterparts. That is an interesting possibility that we look forward to implementing.

Conclusion

Audio denoising is a long-standing problem. By following the approach described in this article, we reached acceptable results with relatively small effort. The benefit of a lightweight model makes it interesting for edge applications. As a next step, we hope to explore new loss functions and model training procedures.

You can get the full code here.

Thanks for reading!

What is Tensorflow | Tensorflow for beginners (2020)

What is Tensorflow | Tensorflow for beginners (2020)

This Tensorflow tutorial beginners will help you understand what is Tensorflow in a very simplified manner for beginners. TensorFlow is one of the best libraries to implement Deep Learning. Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

This video will help you understand what is Tensorflow in a very simplified manner for beginners. TensorFlow is one of the best libraries to implement deep learning
Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

TensorFlow is dead, long live TensorFlow!

TensorFlow is dead, long live TensorFlow!

TensorFlow is ann open source machine learning library for research and production. TensorFlow is dead, long live TensorFlow

TensorFlow is ann open source machine learning library for research and production. TensorFlow is dead, long live TensorFlow

If you’re an AI enthusiast and you didn’t see the big news this month, you might have just snoozed through an off-the-charts earthquake. Everything is about to change!

Last year I wrote 9 Things You Need To Know About TensorFlow… but there’s one thing you need to know above all others: TensorFlow 2.0 is here!

The revolution is here! Welcome to TensorFlow 2.0.
It’s a radical makeover. The consequences of what just happened are going to have major ripple effects on every industry, just you wait. If you’re a TF beginner in mid-2019, you’re extra lucky because you picked the best possible time to enter AI (though you might want to start from scratch if your old tutorials have the word “session” in them).

In a nutshell:TensorFlow has just gone full Keras. Those of you who know those words just fell out of your chairs. Boom!

A prickly experience

I doubt that many people have accused TensorFlow 1.x of being easy to love. It’s the industrial lathe of AI… and about as user-friendly. At best, you might feel grateful for being able to accomplish your AI mission at mind-boggling scale.

You’d also attract some raised eyebrows if you claimed that TensorFlow 1.x was easy to get the hang of. Its steep learning curve made it mostly inaccessible to the casual user, but mastering it meant you could talk about it the way you’d brag about that toe you lost while climbing Everest. Was it fun? No, c’mon, really: was it fun?

You‘re not the only one — it’s what TensorFlow 1.x tutorials used to feel like for everybody.

TensorFlow’s core strength is performance. It was built for taking models from research to production at massive scale and it delivers, but TF 1.x made you sweat for it. Persevere and you’d be able to join the ranks of ML practitioners who use it for incredible things, like finding new planets and pioneering medicine.

What a pity that such a powerful tool was in the hands of so few… until now.

Don’t worry about what tensors are. We

just called them (generalized) matrices where I grew up. The name

TensorFlow is a nod to the fact that TF’s very good at performing

distributed computations involving multidimensional arrays (er,

matrices), which you’ll find handy for

AI](http://bit.ly/quaesita_emperor) "http://bit.ly/quaesita_emperor)") at scale.

Image source.](http://karlstratos.com/drawings/drawings.html). "http://karlstratos.com/drawings/drawings.html).")

Cute and cuddly Keras

Now that we’ve covered cactuses, let’s talk about something you’d actually want to hug. Overheard at my place of work: “I think I have an actual crush on Keras.”

Keras is a specification for building models layer-by-layer that works with multiple machine learning frameworks (so it’s not a TF thing), but you might know it as a high level API accessed from within TensorFlow as tf.keras.

Incidentally, I’m writing this section on Keras’ 4th birthday (Mar 27, 2019) for an extra dose of warm fuzzies.

Keras was built from the ground up to be Pythonic and always put people first — it was designed to be inviting, flexible, and simple to learn.

Why don’t we have both?

Why must we choose between Keras’s cuddliness and traditional TensorFlow’s mighty performance? What don’t we have both?

Great idea! Let’s have both! That’s TensorFlow 2.0 in a nutshell.

This is TensorFlow 2.0. You can mash those orange buttons yourself here.](http://bit.ly/tfoview). "http://bit.ly/tfoview).")

The revolution is here! Welcome to TensorFlow 2.0.### The usability revolution

Going forward, Keras will be the high level API for TensorFlow and it’s extended so that you can use all the advanced features of TensorFlow directly from tf.keras.

The revolution is here! Welcome to TensorFlow 2.0.

In the new version, everything you’ve hated most about TensorFlow 1.x gets the guillotine. Having to perform a dark ritual just to add two numbers together? Dead. TensorFlow Sessions? Dead. A million ways to do the exact same thing? Dead. Rewriting code if you switch hardware or scale? Dead. Reams of boilerplate to write? Dead. Horrible unactionable error messages? Dead. Steep learning curve? Dead.

The revolution is here! Welcome to TensorFlow 2.0.
You’re expecting the obvious catch, aren’t you? Worse performance? Guess again! We’re not giving up performance.

TensorFlow is now cuddly and this is a game-changer, because it means that one of the most potent tools of our time just dropped the bulk of its barriers to entry. Tech enthusiasts from all walks of life are finally empowered to join in because the new version opens access beyond researchers and other highly-motivated folks with an impressive pain threshold.

The revolution is here! Welcome to TensorFlow 2.0.
Everyone is welcome. Want to play? Then come play!

Eager to please

In TensorFlow 2.0, eager execution is now the default. You can take advantage of graphs even in eager context, which makes your debugging and prototyping easy, while the TensorFlow runtime takes care of performance and scaling under the hood.

Wrangling graphs in TensorFlow 1.x (declarative programming) was disorienting for many, but it’s all just a bad dream now with eager execution (imperative programming). If you skipped learning it before, so much the better. TF 2.0 is a fresh start for everyone.

As easy as one… one… one…

Many APIs got consolidated across TensorFlow under Keras, so now it’s easier to know what you should use when. For example, now you only need to work with one set of optimizers and one set of metrics. How many sets of layers? You guessed it! One! Keras-style, naturally.

In fact, the whole ecosystem of tools got a spring cleaning, from data processing pipelines to easy model exporting to TensorBoard integration with Keras, which is now a… one-liner!

There are also great tools that let you switch and optimize distribution strategies for amazing scaling efficiency without losing any of the convenience of Keras.

Those distribution strategies are pretty, aren’t they?

The catch!

If the catch isn’t performance, what is it? There has to be a catch, right?

Actually, the catch was your suffering up to now. TensorFlow demanded quite a lot of patience from its users while a friendly version was brewing. This wasn’t a matter of sadism. Making tools for deep learning is new territory, and we’re all charting it as we go along. Wrong turns were inevitable, but we learned a lot along the way.

The revolution is here! Welcome to TensorFlow 2.0.
The TensorFlow community put in a lot of elbow grease to make the initial magic happen, and then more effort again to polish the best gems while scraping out less fortunate designs. The plan was never to force you to use a rough draft forever, but perhaps you habituated so well to the discomfort that you didn’t realize it was temporary. Thank you for your patience!
The revolution is here! Welcome to TensorFlow 2.0.
The reward is everything you appreciate about TensorFlow 1.x made friendly under a consistent API with tons of duplicate functionality removed so it’s cleaner to use. Even the errors are cleaned up to be concise, simple to understand, and actionable. Mighty performance stays!

What’s the big deal?

Haters (who’re gonna hate) might say that much of v2.0 could be cobbled together in v1.x if you searched hard enough, so what’s all the fuss about? Well, not everyone wants to spend our days digging around in clutter for buried treasure. The makeover and clean-up are worth a standing ovation. But that’s not the biggest big deal.

The point not to miss is this: TensorFlow just announced an uncompromising focus on usability.

The revolution is here! Welcome to TensorFlow 2.0.
AI lets you automate tasks you can’t come up with instructions for. It lets you automate the ineffable. Democratization means that AI at scale will no longer be the province of a tiny tech elite.
The revolution is here! Welcome to TensorFlow 2.0.
Imagine a future where “I know how to make things with Python and “I know how to make things with AI are equally commonplace statements… Exactly! I’m almost tempted to use that buzzword “disruptive” here.

The great migration

We know it’s hard work to upgrade to a new version, especially when the changes are so dramatic. If you’re about to embark on migrating your codebase to 2.0, you’re not alone — we’ll be doing the same here at Google with one of the largest codebases in the world. As we go along, we’ll be sharing migration guides to help you out.

The revolution is here! Welcome to TensorFlow 2.0.
If you rely on specific functionality, you won’t be left in the lurch — except for contrib, all TF 1.x functions will live on in the compat.v1 compatibility module. We’re also giving you a script which automatically updates your code so it runs on TensorFlow 2.0. Learn more in the video below.

This video’s is a great resource if you’re eager to dig deeper into TF 2.0 and geek out on code snippets.

Your clean slate

TF 2.0 is a beginner’s paradise, so it will be a downer for those who’ve been looking forward to watching newbies suffer the way you once suffered. If you were hoping to use TensorFlow for hazing new recruits, you might need to search for some other way to inflict existential horror.

The revolution is here! Welcome to TensorFlow 2.0.
Sitting out might have been the smartest move, because now’s the best time to arrive on the scene. As of March 2019, TensorFlow 2.0 is available in alpha (that’s a preview, you hipster you), so learning it now gets you ready in time for the full release that the community is gearing up for over the next quarter.
The revolution is here! Welcome to TensorFlow 2.0.
Following the dramatic changes, you won’t be as much of a beginner as you imagined. The playing field got leveled, the game got easier, and there’s a seat saved just for you. Welcome! I’m glad you’re finally here and I hope you’re as excited about this new world of possibilities as I am.

Dive in!

Check out the shiny redesigned tensorflow.org for tutorials, examples, documentation, and tools to get you started… or dive straight in with:

pip install tensorflow==2.0.0-alpha0

You’ll find detailed instructions here.

Tensorflow Tutorial for Beginners - Tensorflow on Neural Networks

Tensorflow Tutorial for Beginners - Tensorflow on Neural Networks

In this TensorFlow tutorial for beginners - TensorFlow on Neural Networks, you will learn TensorFlow concepts like what are Tensors, what are the program elements in TensorFlow , what are constants & placeholders in TensorFlow Python, how variable works in placeholder and a demo on MNIST.

TensorFlow Tutorial for Beginners - TensorFlow on Neural Networks

In this TensorFlow tutorial for beginners - TensorFlow on Neural Networks, you will learn TensorFlow concepts like what are Tensors, what are the program elements in Tensorflow, what are constants & placeholders in TensorFlow Python, how variable works in placeholder and a demo on MNIST.