As the most popular Python library for analytics, Pandas is a big project that offers various data manipulation and processing capabilities. It is probably no exaggeration to say that data scientists, myself included, use Pandas on a day-to-day basis in our work.

This blog is Part 1 of the mini-series dedicated to sharing my top 10 lesser-known yet most favorable features in Pandas. Hopefully, you can walk away with some inspirations to make your own code more robust and efficient.

The dataset for this mini-series is from the Table of food nutrients, a Wikipedia page containing 16 tabular lists for basic foods categorized by food types, and their nutrients. For this demonstration, specifically, we will work with a subset of the Dairy products table, as shown below,

**1. Scraping tables from HTML with ****read_html(match)**

When it comes to web scraping in Python, my go-to library used to be the BeautifulSoup until I discovered read_html() in Pandas. Without the hassle of parsing the HTML page, we can directly extract the data stored as HTML tables,

##==== 1\. Web tables scraping using read_html() ====##
	## Using the arg. match 
	dairy_table = pd.read_html(url, match='Fortified milk')

	dairy_table = dairy_table[0]
	print(dairy_table.head())

	''' OUTPUT: ** Total data tables = 16 **
	                       0        1    ...       7         8
	0         Dairy products      NaN    ...     NaN       NaN
	1                   Food  Measure    ...     Fat  Sat. fat
	..                   ...      ...    ...     ...       ...
	3                   skim    1 qt.    ...       t         t
	4   Buttermilk, cultured    1 cup    ...       5         4
	'''

Noticed the arg. match = ‘Fortified milk’ in the code? It is used to only select table(s) containing the string or regular expression specified, which, in our case, is the dairy table. The match arg. will be extremely handy when our HTML page becomes big.

Looking at the output display, however, we realize that quite a few rows and columns are truncated!

#tips-and-tricks #data-science #python #pandas #programming

6 Lesser-Known Yet Awesome Tricks in Pandas
1.70 GEEK