HTML to Markdown

This is a series of short 10 minute Python articles helping you to boost your knowledge of Python. I try to post an article each day (no promises), starting from the very basics, going up to more complex idioms. Feel free to contact me on LinkedIn for questions or requests on particular subjects of Python you want to know about.

Yesterday, we talked about classes and how to use the object oriented programming (OOP) paradigm to address problems. The objects that we create can mimic real-life objects and the data and methods are all bundled inside the class. We also discussed some of the inner workings of Python: it uses dunder-methods to define interactions between objects. The thing we did not yet mention is seen as a major benefit of OOP: inheritance. Inheritance is the ability to create a new so called child-class that inherits properties from its parent class. Lets have a look at an example to show what this means:

While this is fine, we all know that a Square is a special type of Rectangle with both the length and width the same size. Writing two individual classes seems very WET and as the objects are very related, we can make use of inheritance:

Code-wise, we have a small benefit as we did not have to repeat the methods for perimeter and area. The biggest benefit is however that we have the definition for those methods in one place instead of two. Would we make a mistake in one of these two methods, we only need to correct one definition.

Lets now analyze what is actually happening here. The Rectangle class was not changed so there is nothing to discuss. The Square class however has the Rectangle class added between parenthesis in the main definition. Using this syntax we define that the Square class is built with the Rectangle as a basis and thereby, inherits all of its properties. Next, we define a constructor but as we inherit all the properties from the Rectangle class, we are actually overwriting the inherited constructor. The new constructor only takes the length, as all sides of a square are equally sized. Now comes something special: we call the constructor or the parent class. To access the parent class, Python has the super() function. It returns the definition of the parent class and by directly calling the dunder init method, we can call the original constructor. The original constructor expects a length and width which we supply by entering the length twice. The original constructor adds these values to the self object, the object that holds the instance its data. The new definition also overwrites the repr dunder method. If we would not have overwritten the method, it would use the method from the Rectangle class. Here is another example:
1

What is is Pytesseract and how reliable is it ?

2

3

Pytesseract is a wrapper for Google’s OCR engine.
4

5
That one line should most probably leave you extremely pleased. I mean come on. Google? And OCR ? That’s the point when you know it’s good.
6

7

That’s nice and all, but how do I get it up and running ?

8

9
Ok, time to start downloading stuff.
10

11
I’m writing this article assuming you’re using Anaconda, and trust me it’s significantly easier setting things up using Anaconda instead of doing it manually using pip. There’s just so much that can go wrong.
12

13
So first things first let’s get our hands on the OCR engine itself !
14

15
Head over to https://github.com/UB-Mannheim/tesseract/wiki and get the 32-bit or 64-bit version depending on your system architecture. If you don’t know which one to get, open your computer settings (windows key + I on windows) and type About.
16

17
After it’s done downloading just install it like a regular program (by double clicking and following the on-screen instructions). Now open up the folder where it’s downloaded and press Control + L. Now press Control + C. This should copy the path of the folder. We’re gonna be needing that.
18

19
Once that’s done , **type system variables **in the windows search box and hit **_enter _**when it says Edit the system environment variables.
20

21
Image for post
22

23
System properties dialog box should pop open.
24

#code #python #translation #ocr-software

Performing Optical Character Recognition with Python and Pytesseract using Anaconda
2.50 GEEK