Deep Speech is an open-source Speech-To-Text engine. Project Deep Speech uses TensorFlow for the easier implementation.
Deep Speech is composed of two main subsystems:
Transfer learning is the reuse of a pre-trained model on a new problem. It’s currently very popular in deep learning because it can train deep neural network with comparatively little data. This is very useful in the data science field since most real-world problems typically do not have millions of labeled data points to train such complex models.
Comparatively most native languages are lack of resources to train a neural network from scratch. This approach will be useful to create your own model using a small amount of speech to text corpus.
English and Mandarin (also some European languages) are the super example for Deep Speech ASR models. This shows that completely different linguistic features can be learned through the same network. It can be easily adapted to different languages. There are some language under progress in development.
5. Train while freezing layers in pre-trained model
*Note : Language Model is the time consuming part of this approach. Depending on the response, will make it a new article on building language model to train Deep Speech in a custom way.
#language #deep-speech #data-science #nlp #machine-learning