1596381960
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images.
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, BMP, tiff, and others. Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file.
We’re going to start experimenting with tesseract using just a simple image of nice clean text.
Lets first import Image from PIL and display the image text.png.
from PIL import Image
image = Image.open("../input/ocr.png")
display(image)
Great, we have a base image of some big clear text
Let’s import pytesseract and use the dir() function to get a sense of what might be some interesting functions to play with.
import pytesseract
dir(pytesseract)
[‘Output’,
‘TSVNotSupported’,
‘TesseractError’,
‘TesseractNotFoundError’,
‘builtins’,
‘cached’,
‘doc’,
‘file’,
‘loader’,
‘name’,
‘package’,
‘path’,
‘spec’,
‘get_tesseract_version’,
‘image_to_boxes’,
‘image_to_data’,
‘image_to_osd’,
‘image_to_pdf_or_hocr’,
‘image_to_string’,
‘pytesseract’, ‘run_and_get_output’]
It looks like there are just a handful of interesting functions, and I think image_to_string is probably our best bet. Let’s use the help() function to interrogate this a bit more
help(pytesseract.image_to_string)
Help on function image_to_string in module pytesseract.pytesseract: image_to_string(image, lang=None, config=’’, nice=0, output_type=’string’) Returns the result of a Tesseract OCR run on the provided image to a string.
Ok, let’s try and run tesseract on this image
text = pytesseract.image_to_string(image)
print(text)
See the magic of OCR using
pytessaract. we will be able to
read the content of image and
convert it to text.
In the previous example, we were using a clear, unambiguous image for conversion. Sometimes there will be noise in images you want to OCR, making it difficult to extract the text. Luckily, there are techniques we can use to increase the efficacy of OCR with pytesseract and Pillow.
Let’s use a different image this time, with the same text as before but with added noise in the picture.
#data-science #python #optical-character-recogn #tesseract #pytesseract
1601619158
OCR technology is implemented by applying technologies like ML, AI, and Data Science. Implementing OCR as a process returns the value depending upon the scanned data set.
#optical character recognition technology #ocr implementation #ocr technology #ocr system #optical character recognition
1596381960
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images.
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, BMP, tiff, and others. Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file.
We’re going to start experimenting with tesseract using just a simple image of nice clean text.
Lets first import Image from PIL and display the image text.png.
from PIL import Image
image = Image.open("../input/ocr.png")
display(image)
Great, we have a base image of some big clear text
Let’s import pytesseract and use the dir() function to get a sense of what might be some interesting functions to play with.
import pytesseract
dir(pytesseract)
[‘Output’,
‘TSVNotSupported’,
‘TesseractError’,
‘TesseractNotFoundError’,
‘builtins’,
‘cached’,
‘doc’,
‘file’,
‘loader’,
‘name’,
‘package’,
‘path’,
‘spec’,
‘get_tesseract_version’,
‘image_to_boxes’,
‘image_to_data’,
‘image_to_osd’,
‘image_to_pdf_or_hocr’,
‘image_to_string’,
‘pytesseract’, ‘run_and_get_output’]
It looks like there are just a handful of interesting functions, and I think image_to_string is probably our best bet. Let’s use the help() function to interrogate this a bit more
help(pytesseract.image_to_string)
Help on function image_to_string in module pytesseract.pytesseract: image_to_string(image, lang=None, config=’’, nice=0, output_type=’string’) Returns the result of a Tesseract OCR run on the provided image to a string.
Ok, let’s try and run tesseract on this image
text = pytesseract.image_to_string(image)
print(text)
See the magic of OCR using
pytessaract. we will be able to
read the content of image and
convert it to text.
In the previous example, we were using a clear, unambiguous image for conversion. Sometimes there will be noise in images you want to OCR, making it difficult to extract the text. Luckily, there are techniques we can use to increase the efficacy of OCR with pytesseract and Pillow.
Let’s use a different image this time, with the same text as before but with added noise in the picture.
#data-science #python #optical-character-recogn #tesseract #pytesseract
1620729846
Can you use WordPress for anything other than blogging? To your surprise, yes. WordPress is more than just a blogging tool, and it has helped thousands of websites and web applications to thrive. The use of WordPress powers around 40% of online projects, and today in our blog, we would visit some amazing uses of WordPress other than blogging.
What Is The Use Of WordPress?
WordPress is the most popular website platform in the world. It is the first choice of businesses that want to set a feature-rich and dynamic Content Management System. So, if you ask what WordPress is used for, the answer is – everything. It is a super-flexible, feature-rich and secure platform that offers everything to build unique websites and applications. Let’s start knowing them:
1. Multiple Websites Under A Single Installation
WordPress Multisite allows you to develop multiple sites from a single WordPress installation. You can download WordPress and start building websites you want to launch under a single server. Literally speaking, you can handle hundreds of sites from one single dashboard, which now needs applause.
It is a highly efficient platform that allows you to easily run several websites under the same login credentials. One of the best things about WordPress is the themes it has to offer. You can simply download them and plugin for various sites and save space on sites without losing their speed.
2. WordPress Social Network
WordPress can be used for high-end projects such as Social Media Network. If you don’t have the money and patience to hire a coder and invest months in building a feature-rich social media site, go for WordPress. It is one of the most amazing uses of WordPress. Its stunning CMS is unbeatable. And you can build sites as good as Facebook or Reddit etc. It can just make the process a lot easier.
To set up a social media network, you would have to download a WordPress Plugin called BuddyPress. It would allow you to connect a community page with ease and would provide all the necessary features of a community or social media. It has direct messaging, activity stream, user groups, extended profiles, and so much more. You just have to download and configure it.
If BuddyPress doesn’t meet all your needs, don’t give up on your dreams. You can try out WP Symposium or PeepSo. There are also several themes you can use to build a social network.
3. Create A Forum For Your Brand’s Community
Communities are very important for your business. They help you stay in constant connection with your users and consumers. And allow you to turn them into a loyal customer base. Meanwhile, there are many good technologies that can be used for building a community page – the good old WordPress is still the best.
It is the best community development technology. If you want to build your online community, you need to consider all the amazing features you get with WordPress. Plugins such as BB Press is an open-source, template-driven PHP/ MySQL forum software. It is very simple and doesn’t hamper the experience of the website.
Other tools such as wpFoRo and Asgaros Forum are equally good for creating a community blog. They are lightweight tools that are easy to manage and integrate with your WordPress site easily. However, there is only one tiny problem; you need to have some technical knowledge to build a WordPress Community blog page.
4. Shortcodes
Since we gave you a problem in the previous section, we would also give you a perfect solution for it. You might not know to code, but you have shortcodes. Shortcodes help you execute functions without having to code. It is an easy way to build an amazing website, add new features, customize plugins easily. They are short lines of code, and rather than memorizing multiple lines; you can have zero technical knowledge and start building a feature-rich website or application.
There are also plugins like Shortcoder, Shortcodes Ultimate, and the Basics available on WordPress that can be used, and you would not even have to remember the shortcodes.
5. Build Online Stores
If you still think about why to use WordPress, use it to build an online store. You can start selling your goods online and start selling. It is an affordable technology that helps you build a feature-rich eCommerce store with WordPress.
WooCommerce is an extension of WordPress and is one of the most used eCommerce solutions. WooCommerce holds a 28% share of the global market and is one of the best ways to set up an online store. It allows you to build user-friendly and professional online stores and has thousands of free and paid extensions. Moreover as an open-source platform, and you don’t have to pay for the license.
Apart from WooCommerce, there are Easy Digital Downloads, iThemes Exchange, Shopify eCommerce plugin, and so much more available.
6. Security Features
WordPress takes security very seriously. It offers tons of external solutions that help you in safeguarding your WordPress site. While there is no way to ensure 100% security, it provides regular updates with security patches and provides several plugins to help with backups, two-factor authorization, and more.
By choosing hosting providers like WP Engine, you can improve the security of the website. It helps in threat detection, manage patching and updates, and internal security audits for the customers, and so much more.
#use of wordpress #use wordpress for business website #use wordpress for website #what is use of wordpress #why use wordpress #why use wordpress to build a website
1597457400
The necessity of digitisation is rapidly increasing in the modern era. Due to the growth of information and communication technologies (ICT) and the wide availability of handheld devices, people often prefer digitized content over the printed materials including books and newspaper. Also, it is easier to organize digitized data and analyze them for various purposes with many advanced techniques like artificial intelligence etc. So to keep up with the present technological scenario, it is necessary to convert all the information present till now which is in the printed format to digitised format.
Here comes OCR ….Our saviour💪 💪 which helps us in performing the tedious work of digitising the information. OCR stands for **Optical Character Recognition, **whose primary job is to recognise the printed text in an image. Once we recognise the printed text with the help of OCR, we can use that information in various types.
Recognizing
This is a 3-part series of articles that explains various concepts and phases of an OCR system. Let’s have a look at what you are going to learn in each part
Below image shows the different phases in the workflow of an OCR system.
#machine-learning #ocr #image-processing #recognition #segmentation #deep learning
1595719500
***Above, an example of Google OCR API, hopefully, I will be able to do the same with Tesseract, one day
The installation of this library took me longer than usual.
!pip install pytesseract
While you should start by installing pytesseract using pip, if you try to run the library, it will run an error.
TesseractNotFoundError: /usr/bin/tesseract is not installed or it's not in your PATH
The installation is a little bit hectic. In fact, you will first need to install another package called tesseract-ocr and make a direct cmd connection to the .exe file (all written in the instructions and available on my repo, do not despair).
!sudo apt install tesseract-ocr
Make sure you are installing both libraries together.
try:
from PIL import Image
except ImportError:
import Image
import cv2
import pytesseract
Before proceeding, you will need to find out where do you have to find the tesseract execution file.
!which tesseract
/usr/bin/tesseract
You can now copy the output to specify the location of the .exe file. Unfortunately, it appears this is the only workaround to make Tesseract work on Google Colab. So far, this appears to be the only working tutorial between the many I searched.
pytesseract.pytesseract.tesseract_cmd = (
r'/usr/bin/tesseract'
)
The library should have been imported correctly.
I will be using the cv2 library to import and edit images. I will have to make sure that in my notebook storage I have uploaded the image I want, and that I can access its path correctly. In this case, the image is called image.png.
#deep-learning #ocr #artificial-intelligence #image-to-text #tesseract #deep learning