Introduction

Deep Speech is an open-source Speech-To-Text engine. Project Deep Speech uses TensorFlow for the easier implementation.

Deep Speech is composed of two main subsystems:

  1. Acoustic model : a deep neural network that receives audio features as inputs, and outputs character probabilities.
  2. Decoder : uses a beam search algorithm to transform the character probabilities into textual transcripts that are then returned by the system.

Transfer learning is the reuse of a pre-trained model on a new problem. It’s currently very popular in deep learning because it can train deep neural network with comparatively little data. This is very useful in the data science field since most real-world problems typically do not have millions of labeled data points to train such complex models.

Comparatively most native languages are lack of resources to train a neural network from scratch. This approach will be useful to create your own model using a small amount of speech to text corpus.

Benchmarks

English and Mandarin (also some European languages) are the super example for Deep Speech ASR models. This shows that completely different linguistic features can be learned through the same network. It can be easily adapted to different languages. There are some language under progress in development.

Steps

  1. Clone Deep Speech repository from https://github.com/mozilla/DeepSpeech
  2. Prepare Speech and Transcript corpus form https://commonvoice.mozilla.org/en/datasets
  3. Building Language model using KenLM
  4. Get relevant pre-trained English model here https://github.com/mozilla/DeepSpeech/releases

5. Train while freezing layers in pre-trained model

*Note : Language Model is the time consuming part of this approach. Depending on the response, will make it a new article on building language model to train Deep Speech in a custom way.

#language #deep-speech #data-science #nlp #machine-learning

Deep Speech : Train Native Languages with Transfer Learning
10.35 GEEK