The Ultimate Guide To Speech Recognition With Python

The Ultimate Guide To Speech Recognition With Python

The Ultimate Guide To Speech Recognition With Python. Introduction to Speech Recognition with Python. In this Python tutorial, you will see how we can develop a very simple speech recognition application that is capable of recognizing speech from audio files, as well as live from a microphone. Several speech recognition libraries have been developed in Python. However we will be using the SpeechRecognition library, which is the simplest of all the libraries.

Speech recognition, as the name suggests, refers to automatic recognition of human speech. Speech recognition is one of the most important tasks in the domain of human computer interaction. If you have ever interacted with Alexa or have ever ordered Siri to complete a task, you have already experienced the power of speech recognition.

Speech recognition has various applications ranging from automatic transcription of speech data (like voicemails) to interacting with robots via speech.

In this tutorial, you will see how we can develop a very simple speech recognition application that is capable of recognizing speech from audio files, as well as live from a microphone. So, let's begin without further ado.

Several speech recognition libraries have been developed in Python. However we will be using the SpeechRecognition library, which is the simplest of all the libraries.

Installing SpeechRecognition Library

Execute the following command to install the library:

$ pip install SpeechRecognition

Speech Recognition from Audio Files

In this section, you will see how we can translate speech from an audio file to text. The audio file that we will be using as input can be downloaded from this link. Download the file to your local file system.

The first step, as always, is to import the required libraries. In this case, we only need to import the speech_recognition library that we just downloaded.

import speech_recognition as speech_recog

To convert speech to text the one and only class we need is the Recognizer class from the speech_recognition module. Depending upon the underlying API used to convert speech to text, the Recognizer class has following methods:

  • recognize_bing(): Uses Microsoft Bing Speech API
  • recognize_google(): Uses Google Speech API
  • recognize_google_cloud(): Uses Google Cloud Speech API
  • recognize_houndify(): Uses Houndify API by SoundHound
  • recognize_ibm(): Uses IBM Speech to Text API
  • recognize_sphinx(): Uses PocketSphinx API

Among all of the above methods, the recognize_sphinx() method can be used offline to translate speech to text.

To recognize speech from an audio file, we have to create an object of the AudioFile class of the speech_recognition module. The path of the audio file that you want to translate to text is passed to the constructor of the AudioFile class. Execute the following script:

sample_audio = speech_recog.AudioFile('E:/Datasets/my_audio.wav')

In the above code, update the path to the audio file that you want to transcribe.

We will be using the recognize_google() method to transcribe our audio files. However, the recognize_google() method requires the AudioData object of the speech_recognition module as a parameter. To convert our audio file to an AudioData object, we can use the record() method of the Recognizer class. We need to pass the AudioFile object to the record() method, as shown below:

with sample_audio as audio_file:
    audio_content = recog.record(audio_file)

Now if you check the type of the audio_content variable, you will see that it has the type speech_recognition.AudioData.

type(audio_content)

Output:

speech_recognition.AudioData

Now we can simply pass the audio_content object to the recognize_google() method of the Recognizer() class object and the audio file will be converted to text. Execute the following script:

recog.recognize_google(audio_content)

Output:

'Bristol O2 left shoulder take the winding path to reach the lake no closely the size of the gas tank degrees office 30 face before you go out the race was badly strained and hung them the stray cat gave birth to kittens the young girl gave no clear response the meal was called before the bells ring what weather is in living'

The above output shows the text of the audio file. You can see that the file has not been 100% correctly transcribed, yet the accuracy is pretty reasonable.

Setting Duration and Offset Values

Instead of transcribing the complete speech, you can also transcribe a particular segment of the audio file. For instance, if you want to transcribe only the first 10 seconds of the audio file, you need to pass 10 as the value for the duration parameter of the record() method. Look at the following script:

sample_audio = speech_recog.AudioFile('E:/Datasets/my_audio.wav')
with sample_audio as audio_file:
    audio_content = recog.record(audio_file, duration=10)

recog.recognize_google(audio_content)

Output:

'Bristol O2 left shoulder take the winding path to reach the lake no closely the size of the gas'

In the same way, you can skip some part of the audio file from the beginning using the offset parameter. For instance, if you do not want to transcribe the first 4 seconds of the audio, pass 4 as the value for the offset attribute. As an example, the following script skips the first 4 seconds of the audio file and then transcribes the audio file for 10 seconds.

sample_audio = speech_recog.AudioFile('E:/Datasets/my_audio.wav')
with sample_audio as audio_file:
    audio_content = recog.record(audio_file, offset=4, duration=10)

recog.recognize_google(audio_content)

Output:

'take the winding path to reach the lake no closely the size of the gas tank web degrees office dirty face'

Handling Noise

An audio file can contain noise due to several reasons. Noise can actually affect the quality of speech to text translation. To reduce noise, the Recognizer class contains adjust_for_ambient_noise() method, which takes the AudioData object as a parameter. The following script shows how you can improve transcription quality by removing noise from the audio file:

sample_audio = speech_recog.AudioFile('E:/Datasets/my_audio.wav')
with sample_audio as audio_file:
    recog.adjust_for_ambient_noise(audio_file)
    audio_content = recog.record(audio_file)

recog.recognize_google(audio_content)

Output:

'Bristol O2 left shoulder take the winding path to reach the lake no closely the size of the gas tank web degrees office 30 face before you go out the race was badly strained and hung them the stray cat gave birth to kittens the younger again no clear response the mail was called before the bells ring what weather is in living'

The output is quite similar to what we got earlier; this is due to the fact that the audio file had very little noise already.

Speech Recognition from Live Microphone

In this section you will see how you can transcribe live audio received via a microphone on your system.

There are several ways to process audio input received via microphone, and various libraries have been developed to do so. One such library is PyAudio. Execute the following script to install the PyAudio library:

$ pip install PyAudio

Now the source for the audio to be transcribed is a microphone. To capture the audio from a microphone, we need to first create an object of the Microphone class of the Speach_Recogniton module, as shown here:

mic = speech_recog.Microphone()

To see the list of all the microphones in your system, you can use the list_microphone_names() method:

speech_recog.Microphone.list_microphone_names()

Output:

['Microsoft Sound Mapper - Input',
 'Microphone (Realtek High Defini',
 'Microsoft Sound Mapper - Output',
 'Speakers (Realtek High Definiti',
 'Microphone Array (Realtek HD Audio Mic input)',
 'Speakers (Realtek HD Audio output)',
 'Stereo Mix (Realtek HD Audio Stereo input)']

This is a list of microphones available in my system. Keep in mind that your list will likely look different.

The next step is to capture the audio from the microphone. To do so, you need to call the listen() method of the Recognizer() class. Like the record() method, the listen() method also returns the speech_recognition.AudioData object, which can then be passed to the recognize_google() method.

The following script prompts the user to say something in the microphone and then prints whatever the user has said:

with mic as audio_file:
    print("Speak Please")

    recog.adjust_for_ambient_noise(audio_file)
    audio = recog.listen(audio_file)

    print("Converting Speech to Text...")
    print("You said: " + recog.recognize_google(audio))

Once you execute the above script, you will see the following message:

Please say something

At this point of time, say whatever you want and then pause. Once you have paused, you will see the transcription of whatever you said. Here is the output I got:

Converting Speech to Text...
You said: hello this is normally from stack abuse abuse this is an article on speech recognition I hope you will like it and this is just a test speech and when I will stop speaking are you in today thank you for Reading

It is important to mention that if recognize_google() method is not able to match the words you speak with any of the words in its repository, an exception is thrown. You can test this by saying some unintelligible words. You should see the following exception:

Speak Please
Converting Speech to Text...
---------------------------------------------------------------------------
UnknownValueError                         Traceback (most recent call last)
<ipython-input-27-41218bc8a239> in <module>
      8     print("Converting Speech to Text...")
      9
---> 10     print("You said: " + recog.recognize_google(audio))
     11
     12

~\Anaconda3\lib\site-packages\speech_recognition\__init__.py in recognize_google(self, audio_data, key, language, show_all)
    856         # return results
    857         if show_all: return actual_result
--> 858         if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
    859
    860         if "confidence" in actual_result["alternative"]:

UnknownValueError:

A better approach is to use the try block when the recognize_google() method is called as shown below:

with mic as audio_file:
    print("Speak Please")

    recog.adjust_for_ambient_noise(audio_file)
    audio = recog.listen(audio_file)

    print("Converting Speech to Text...")

    try:
        print("You said: " + recog.recognize_google(audio))
    except Exception as e:
        print("Error: " + str(e))

Conclusion

Speech recognition has various useful applications in the domain of human computer interaction and automatic speech transcription. This article briefly explains the process of speech transcription in Python via speech_recognition library and explains how to translate speech to text when the audio source is an audio file or live microphone.

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

What’s new in HTML6

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

Python For Data Science - How to use Data Science with Python

Python For Data Science - How to use Data Science with Python

This Edureka video on 'Python For Data Science - How to use Data Science with Python - Data Science using Python ' will help you understand how we can use python for data science along with various use cases. What is Data Science? Why Python? Python Libraries For Data Science. Roadmap To Data Science With Python. Data Science Jobs and Salary Trends

This Edureka video on 'Python For Data Science - How to use Data Science with Python - Data Science using Python
' will help you understand how we can use python for data science along with various use cases. Following are the topics discussed this Python Data Science Tutorial:

  • What is Data Science?
  • Why Python?
  • Python Libraries For Data Science
  • Roadmap To Data Science With Python
  • Data Science Jobs and Salary Trends
  • How Edureka Helps?

Machine Learning, Data Science and Deep Learning with Python

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on Machine Learning tutorial with Data Science, Tensorflow, Artificial Intelligence, and Neural Networks. Introducing Tensorflow, Using Tensorflow, Introducing Keras, Using Keras, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Learning Deep Learning, Machine Learning with Neural Networks, Deep Learning Tutorial with Python

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on Machine Learning tutorial with Data Science, Tensorflow, Artificial Intelligence, and Neural Networks

Explore the full course on Udemy (special discount included in the link): http://learnstartup.net/p/BkS5nEmZg

In less than 3 hours, you can understand the theory behind modern artificial intelligence, and apply it with several hands-on examples. This is machine learning on steroids! Find out why everyone’s so excited about it and how it really works – and what modern AI can and cannot really do.

In this course, we will cover:
• Deep Learning Pre-requistes (gradient descent, autodiff, softmax)
• The History of Artificial Neural Networks
• Deep Learning in the Tensorflow Playground
• Deep Learning Details
• Introducing Tensorflow
• Using Tensorflow
• Introducing Keras
• Using Keras to Predict Political Parties
• Convolutional Neural Networks (CNNs)
• Using CNNs for Handwriting Recognition
• Recurrent Neural Networks (RNNs)
• Using a RNN for Sentiment Analysis
• The Ethics of Deep Learning
• Learning More about Deep Learning

At the end, you will have a final challenge to create your own deep learning / machine learning system to predict whether real mammogram results are benign or malignant, using your own artificial neural network you have learned to code from scratch with Python.

Separate the reality of modern AI from the hype – by learning about deep learning, well, deeply. You will need some familiarity with Python and linear algebra to follow along, but if you have that experience, you will find that neural networks are not as complicated as they sound. And how they actually work is quite elegant!

This is hands-on tutorial with real code you can download, study, and run yourself.

Best Python Libraries For Data Science & Machine Learning

Best Python Libraries For Data Science & Machine Learning

Best Python Libraries For Data Science & Machine Learning | Data Science Python Libraries

This video will focus on the top Python libraries that you should know to master Data Science and Machine Learning. Here’s a list of topics that are covered in this session:

  • Introduction To Data Science And Machine Learning
  • Why Use Python For Data Science And Machine Learning?
  • Python Libraries for Data Science And Machine Learning
  • Python libraries for Statistics
  • Python libraries for Visualization
  • Python libraries for Machine Learning
  • Python libraries for Deep Learning
  • Python libraries for Natural Language Processing

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading about Python

Complete Python Bootcamp: Go from zero to hero in Python 3

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python and Django Full Stack Web Developer Bootcamp

Complete Python Masterclass

Python Tutorial - Python GUI Programming - Python GUI Examples (Tkinter Tutorial)

Computer Vision Using OpenCV

OpenCV Python Tutorial - Computer Vision With OpenCV In Python

Python Tutorial: Image processing with Python (Using OpenCV)

A guide to Face Detection in Python

Machine Learning Tutorial - Image Processing using Python, OpenCV, Keras and TensorFlow

PyTorch Tutorial for Beginners

The Pandas Library for Python

Introduction To Data Analytics With Pandas