Fetching tables from PDF files is no more a difficult task, you can do this using a single line in python.

What you will learn

Tabula

Tabula is one of the useful packages which not only allows you to scrape tables from PDF files but also convert a PDF file directly into a CSV file.

pip install tabula-py

import tabula

lets scrap this PDF data into pandas Data Frame.

Image for post

image by Satya Ganesh

file = "data1.pdf"

table = tabula.read_pdf(file,pages=1)
table[0]

Take a look at the output of above code snippet executed in Google Colabs

Image for post

#machine-learning #programming #data-science #data-scraping #data #data analysis