How to Recognize Optical Characters in Images in Python

How to Recognize Optical Characters in Images in Python

Optical Character Recognition is the process of detecting text content on images and convert it to machine encoded text that we can access and manipulate in Python (or any programming language) as a string variable. In this tutorial, we gonna use Tesseract library to do that.

Humans can easily understand the text content of an image simply by looking at it. However, it is not the case for computers. They need some sort of a structured method or algorithm to be able to understand it. This is where Optical Character Recognition (OCR) comes in to play.

Optical Character Recognition is the process of detecting text content on images and convert it to machine encoded text that we can access and manipulate in Python (or any programming language) as a string variable. In this tutorial, we gonna use Tesseract library to do that.

Tesseract library contains an OCR engine and a command line program, so it has nothing to do with Python, please follow their official guide for installation, as it is a requirement tool for this tutorial.

We gonna use pytesseract module for Python which is a wrapper for Tesseract-OCR engine, so we can access it via Python.

The most recent stable version of tesseract is 4 which uses a new recurrent neural network (LSTM) based OCR engine which is focused on line recognition.

Let's get started, you need to install:

  • Tesseract-OCR Engine (follow their guide for your operating system).
  • pytesseract wrapper module using:
    pip3 install pytesseract

Other utility modules for this tutorial:

pip3 install numpy matplotlib opencv-python pillow

After you have everything installed in your machine, open up a new Python file and follow along:

import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image

For demonstration purposes, I'm gonna use this image for recognition:

This is image title

I've named it "test.png" and put it in the current directory, let's load this image:

# read the image using OpenCV
image = cv2.imread("test.png")
# or you can use Pillow
# image = Image.open("test.png")

As you may notice, you can load the image either with OpenCV or Pillow, I prefer using OpenCV as it enables us to use the live camera.

Let's recognize that text:

# get the string
string = pytesseract.image_to_string(image)
# print it
print(string)

Note: If the above code raises an error, please consider adding Tesseract-OCR binaries to PATH variables. Read their official installation guide more carefully.

image_to_string() function does exactly what you expect, it converts the containing image text to characters, let's see the result:

This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

Excellent, there is another function image_to_data() which outputs more information than that, including words with their corresponding width, height and x, y coordinates, this will enable us to make a lot of useful stuff. For instance, let's search for words in the document and draw a bounding box around a specific word of our choice, below code handles that:

# make a copy of this image to draw in
image_copy = image.copy()
# the target word to search for
target_word = "dog"
# get all data from the image
data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

So we're going to search for the word "dog" in the text document, we want the output data to be structured and not a raw string, that's why I passed output_type to be a dictionary, so we can easily get each word's data (you can print data dictionary to see how the output is organized).

Let's get all the occurences of that word:

# get all occurences of the that word
word_occurences = [ i for i, word in enumerate(data["text"]) if word.lower() == target_word ]

Now let's draw a surrounding box on each word:

for occ in word_occurences:
    # extract the width, height, top and left position for that detected word
    w = data["width"][occ]
    h = data["height"][occ]
    l = data["left"][occ]
    t = data["top"][occ]
    # define all the surrounding box points
    p1 = (l, t)
    p2 = (l + w, t)
    p3 = (l + w, t + h)
    p4 = (l, t + h)
    # draw the 4 lines (rectangular)
    image_copy = cv2.line(image_copy, p1, p2, color=(255, 0, 0), thickness=2)
    image_copy = cv2.line(image_copy, p2, p3, color=(255, 0, 0), thickness=2)
    image_copy = cv2.line(image_copy, p3, p4, color=(255, 0, 0), thickness=2)
    image_copy = cv2.line(image_copy, p4, p1, color=(255, 0, 0), thickness=2)

Saving and showing the resulting image:

plt.imsave("all_dog_words.png", image_copy)
plt.imshow(image_copy)
plt.show()

Take a look on the result:

This is image title

Amazing, isn't it ? This is not all ! you can pass lang parameter to image_to_string() or image_to_data() functions to make it easy recognizing text in different languages. You can also use image_to_boxes() function which recognize characters and their box boundaries, please refer to their official documentation and available languages for more information.

A note though, this method is ideal for recognizing text in scanned documents and papers. Other uses of OCR include the automation of passport recognition and extraction of information from them, data entry processes, detection and recognition of car number plates, and much more!

Also, this won't work very well on hand-written text, complex real world images and unclear images or images that contains exclusive amount of text.

Alright, that's it for this tutorial, let us see what you can build with this utility !

Happy Coding ♥

python machine-learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

How To Plot A Decision Boundary For Machine Learning Algorithms in Python

How To Plot A Decision Boundary For Machine Learning Algorithms in Python, you will discover how to plot a decision surface for a classification machine learning algorithm.

How To Get Started With Machine Learning With The Right Mindset

You got intrigued by the machine learning world and wanted to get started as soon as possible, read all the articles, watched all the videos, but still isn’t sure about where to start, welcome to the club.

What is Supervised Machine Learning

What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI

Python For Machine Learning | Machine Learning With Python

Python For Machine Learning | Machine Learning With Python, you will be working on an end-to-end case study to understand different stages in the Machine Learning (ML) life cycle. This will deal with 'data manipulation' with pandas and 'data visualization' with seaborn. After this an ML model will be built on the dataset to get predictions. You will learn about the basics of scikit-learn library to implement the machine learning algorithm.

Python for Machine Learning | Machine Learning with Python

Python for Machine Learning | Machine Learning with Python, you'll be working on an end-to-end case study to understand different stages in the ML life cycle. This will deal with 'data manipulation' with pandas and 'data visualization' with seaborn. After this, an ML model will be built on the dataset to get predictions. You will learn about the basics of the sci-kit-learn library to implement the machine learning algorithm.