Data is the new oil — but it’s definitely not cheap. We have data flowing in from all directions; web, apps, social media, etc and it is imperative that data scientists are able to mine some of it. In the following blog, we will learn how to quickly mine/scrape data from a website (for fun) using a Python library ‘BeautifulSoup


Plan of action

Table of Contents

  1. Introduce the Use-Case
  2. What is BeautifulSoup?
  3. BS4 in action — understand & extract the data
  4. Last comments

Introduce the Use-Case

Anyone who has worked in customer experience or hospitality industry understands the importance of customer satisfaction. NPS or Net Promoter Score is considered to be a benchmark for customer experience. Although NPS is a specially designed survey, there are other methods to understand customer sentiment. One of them being — **Customer feedback and Rating on Appstore **(of course only if your app is available there).

So here what is we will do —

→Take a random app (eg: Facebook)

→ Go to iTune reviews

→ Extract the rating, comments, date, etc that different user have given

→Export them in a clean ‘csv/xlsx’ format.


What is BeautifulSoup?

Beautiful Soup_(aka BS4)_ is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. It is available for Python 2.7 and Python 3


BS4 in action — understand & extract the data

iTunes has made it really easy to get app review from the Apple App Store. Facebook’s app id is 28488215 and we just need to add the same in the following URL

#data-science #xml #data-mining #data analysis

XML Scraping done right!
1.60 GEEK