Borderless tables detection with deep learning and OpenCV

Way to build your own object detector and turn semi-structured blocks of data in an image into a machine-readable text

documents in tabular format or incidentally in data blocks without distinctive graphical borders. A borderless table may help to simplify the visual perception of semi-structured data for us, humans. From the machine-reading point of view, such presenting information on a page has quite a few shortcomings which make it difficult to separate the data belonging to a presumptive table structure from the surrounding textual context.

Tabular data extraction as a business challenge may have several ad-hoc or heuristiс rules-based solutions which definitely will fail with a table of a bit different layout or style. On a large scale, one should use a more general approach for identifying table-like structures in an image, more specifically a deep learning-based object detection approach.

Scope of this tutorial:

Deep learning-based object detection
Installation and setup of TF2 Object Detection API
Data preparation
Model configuration
Model training and saving
Table detection and cell recognition in a real-life image

#tensorflow #ocr #object-detection #deep-learning #machine-learning #borderless tables detection with deep learning and opencv

Way to build your own object detector and turn semi-structured blocks of data in an image into a machine-readable text

Scope of this tutorial:

towardsdatascience.com

Borderless tables detection with deep learning and OpenCV