How to Convert Speech to Text in Python

I am really interested to learn alien language to communicate with them when they visit our planet. So i need speech to text program to decipher their language.
In this tutorial, we are going to learn how to convert speech to text in python the easiest way.

Contents
1- What is speech recognition
2-How to install speech recognition package using anaconda
3- Using the Recognizer class
4-Taking input from microphones
5- Final code snippet

1- What is speech recogniton

Speech recognition is the ability of a computer software to identify words and phrases in spoken language and convert them to human readable text.

2-How to install speech recognition package using anaconda prompt

Speech Recognition package is compatible with Python 2.6, 2.7 and 3.3+. However, it requires some additional installation steps for Python 2.

In this tutorial, we use Python 3.7.3. If you don’t have anaconda installed in your machine click on https://www.anaconda.com/distribution/ to download

If you are using windows OS, go to all apps and look for anaconda3 (64 bit) folder and click on anaconda prompt.
Install the python speech recognition package using this command

conda install -c conda-forge speechrecognition

It should take 30 seconds to download the package if your internet connection is fast.

You can check the version of the installed package by typing on the anaconda prompt

import speech_recognition as speech_recog
 speech_recog.__version__
3.6.3```

3-Using the Recognizer class

Speech Recognition package has a Recognizer class which recognizes the speech and convert it to text. Following are seven methods which can read various audio sources using different APIs.

recognize_bing( )
recognize_google( )
recognize_google_cloud( )
recognize_houndify( )
recognize_ibm( )
recognize_wit( )
recognize_sphinx( )

For offline speech to text conversion you need to install Pocketsphinx library

We need to create an instance of the Recognizer class. We’ll use the recognize_google() method to access the Google web speech API and convert spoken language into text. Besides, recognize_google() requires an argument audio_data otherwise it returns an error.

import speech_recognition as speech_recog
rec = speech_recog.Recognizer()

4-Taking Input From Microphones

To use the microphones, we have to install PyAudio module
From the anaconda prompt type the code command below

conda install -c conda-forge PyAudio

We use the microphone class to get the input speech from the microphone. For most of the projects, we use the default microphones. However, if you do not wish to use the default microphone, you can get the list of microphone names using the list_microphone_names() method.

To store the input from the microphone we use the record() method. We define a variable duration to recognize the spoken word from the user and assign a default value Please don’t forget to ident after the colon. Below is the code spippet:

with speech_recog.Microphone() as source:
audio_data = rec.record(source, duration=duration)

To convert the spoken word stored in audio_data variable to text, we use the method recognize_google() from google API and we pass the parameter audio_data to it. Below is the code snippet

text = rec.recognize_google(audio_data)

5- Final code snippet
Let’s put everything together. below is the final code snippet

import sys
#read duration from the arguments
duration = 5
rec = speech_recog.Recognizer()
print("Please talk")
with speech_recog.Microphone() as source:
    # read the audio data from the default microphone
    audio_data = rec.record(source, duration=duration)
    print("Recognizing...")
    # convert speech to text
    text = rec.recognize_google(audio_data)
    print(text)```

Hope you understand the implementation of speech to text in python. We build up in my next article to implement such system on raspberry py to control electronic gadgets.

#python #speechrecognition #anaconda #PyAudio