“Getting information off the internet is like taking a drink from a firehose.” — Mitchell Kapor

The internet is flooded with data. However, getting that data in the right format is the real deal. It takes some amount of cleaning and filtering to bring the data into a format which is ideal for analysis. Last week I came across some pretty good tabular data sets on Wikipedia. As always, I copied the data and pasted on the excel sheet. For most parts, it worked well, but there were few instances when the typical copy-paste methodology failed miserably:

  • When the table was typically long and spread across a complete webpage
  • When the data wasn’t static, i.e., it updated after regular intervals. So every time, there was a change in the original dataset, I had to fetch the dataset again.

For situations like these, simply copy-pasting wasn’t enough. Scraping was another option, but I was looking for something quick and easy and which didn’t involve a ton of code. It was then that I came across a convenient function in Google Sheets called **IMPORTHTML**, which is ideal for importing data from a table or list within an HTML page. In this article, I’ll describe the end to the end process of fetching tables( and lists) into google sheets.

Prior to extracting or importing data that hasn’t been curated by you make sure you have the necessary permissions. Also, this method works only if the data is publicly available, which means there shouldn’t be requirements for authorisation etc.

Syntax

Image for post

Image by Author

Before using the function, let’s quickly understand the syntax of the IMPORTHTML function. The IMPORTHTML function takes in three distinct parameters and is written as follows:

IMPORTHTML(URL, query, index)

#data-analysis #google-sheets #data #tabular-data #html

Import HTML tables into Google Sheets effortlessly.
1.40 GEEK