“Getting information off the internet is like taking a drink from a firehose.” — Mitchell Kapor
The internet is flooded with data. However, getting that data in the right format is the real deal. It takes some amount of cleaning and filtering to bring the data into a format which is ideal for analysis. Last week I came across some pretty good tabular data sets on Wikipedia. As always, I copied the data and pasted on the excel sheet. For most parts, it worked well, but there were few instances when the typical copy-paste methodology failed miserably:
For situations like these, simply copy-pasting wasn’t enough. Scraping was another option, but I was looking for something quick and easy and which didn’t involve a ton of code. It was then that I came across a convenient function in Google Sheets called **IMPORTHTML**,
which is ideal for importing data from a table or list within an HTML page. In this article, I’ll describe the end to the end process of fetching tables( and lists) into google sheets.
Prior to extracting or importing data that hasn’t been curated by you make sure you have the necessary permissions. Also, this method works only if the data is publicly available, which means there shouldn’t be requirements for authorisation etc.
Image by Author
Before using the function, let’s quickly understand the syntax of the IMPORTHTML
function. The IMPORTHTML
function takes in three distinct parameters and is written as follows:
IMPORTHTML(URL, query, index)
#data-analysis #google-sheets #data #tabular-data #html