Way to build your own object detector and turn semi-structured blocks of data in an image into a machine-readable text

documents in tabular format or incidentally in data blocks without distinctive graphical borders. A borderless table may help to simplify the visual perception of semi-structured data for us, humans. From the machine-reading point of view, such presenting information on a page has quite a few shortcomings which make it difficult to separate the data belonging to a presumptive table structure from the surrounding textual context.

Tabular data extraction as a business challenge may have several ad-hoc or heuristiс rules-based solutions which definitely will fail with a table of a bit different layout or style. On a large scale, one should use a more general approach for identifying table-like structures in an image, more specifically a deep learning-based object detection approach.

Scope of this tutorial:

  • Deep learning-based object detection
  • Installation and setup of TF2 Object Detection API
  • Data preparation
  • Model configuration
  • Model training and saving
  • Table detection and cell recognition in a real-life image

#tensorflow #ocr #object-detection #deep-learning #machine-learning #borderless tables detection with deep learning and opencv

Borderless tables detection with deep learning and OpenCV
5.95 GEEK