Image capture makes a snapshot in time of a person, place, or object. Many devices include cameras for taking pictures. This is integrated into everyday life. When taking the picture, there is recognition of that picture and often an autocorrection. Taking that further, there is Optical Character Recognition (OCR) that can take a picture of text and create a usable file that is same as document. Creating a definition of a picture, understanding content, is a complex task. OCR addresses this, and a piece of OCR is knowledge from images.

Why AI?

Creating software to translate an image into text is sophisticated but easier with updates to libraries in common tools such as pytesseract in Python. This is a complicated task that requires an image to be statistically evaluated and assigned the highest probably match for each portion for a recognizable letter. Then, these pieces placed together to output a result without error that is same as the original object. This approach is deep learning using recurrent neural network (RNN), Long Short Term Memory (LSTM), to take an image as input and output text from the image in a file. This is known as text extraction from an image.

Project, Image to Text

For this example, take a picture of a receipt and save to local directory. Next, open Python with the pytesseract and cv2 libraries installed. Using little code, the image can be converted to text using a process of layers of learning to understand text from images and return only characters using layers of repetition to “drop out” leaving only text. For this project, pytesseract is pretrained to find only characters or numeric from the English language and will exclude information that is not a letter or number within that defined set. Output is to a file within local directory.

#python #neural-networks #artificial-intelligence #data-science #data-mining

Text Extraction in Python with Neural Networks
34.70 GEEK