How to build Python transcriber using Mozilla DeepSpeech

Voice Assistants are one of the hottest techs right now. Siri, Alexa, Google Assistant, all aim to help you talk to computers and not just touch and type. Automated Speech Recognition (ASR) and Natural Language Understanding (NLU/NLP) are the key technologies enabling it. If you are just-a-programmer like me, you might be itching to get a piece of the action and hack something. You are at the right place; read on.

Though these technologies are hard and the learning curve is steep, but are becoming increasingly accessible. Last month, Mozilla released DeepSpeech along with models for US English. It has smaller and faster models than ever before, and even has a TensorFlow Lite model that runs faster than real time on a single core of a Raspberry Pi 4. There are several interesting aspects, but right now I am going to focus on its refreshingly simple batch and stream APIs in C, .NET, Java, JavaScript, and Python for converting speech to text. By the end of this blog post, you will build a voice transcriber. No kidding :-)

#python #tensorflow #artificial-intelligence #voice-assistant #machine-learning

towardsdatascience.com

How to build Python transcriber using Mozilla DeepSpeech