Since the blow-up of conspiracy theories around coronavirus, social media platforms like Facebook, Twitter, and Instagram have been actively working on scrutinizing and fact-checking to fight against misinformation. As more reliable sources get amplified, Twitter becomes more supportive than it was during the early stage of the outbreak. I figured it would be more interesting to hear the real public voice and discover the true sentiment regarding the coronavirus.

Don’t get intimidated by the word “scraping.” If you can browse the web page, you are able to perform web scraping like a pro, even if you are a newbie. So bear with me.

The easiest way to find out the attitude is by collecting all tweets containing the word of coronavirus. I even narrow down the research scope by setting the language as English and Terrain within the United States. This will ensure the sample data sets stay consistent with the search topic and increase the accuracy of the prediction.

After the research scope is settled, we can now start scraping. I prefer using Octoparsewhen it comes to picking the best web scraping tool, it has auto-detecting features which saves me a lot of time on hand-picking and selecting the data.

Twitter is more dynamic as it has infinite scrolling, meaning tweets are showing up once we keep scrolling down the page. In order to get as many tweets as possible, I build a loop list to maintain the scrolling action while fetching the information. This ensures the scraping workflow stays consistent without interruption.

Next, I create an extraction action. Octoparse renders the web page as we input the search URLs. It will break down the web page structure into sub-component so I can click on the target element easily to set up a command and tell the robot — go get the information for me. As I click one of the tweets, the tips panel pops up suggesting to select the sub-elements.

Image for post

There it is! A corresponding event is added to the workflow automatically. It also finds other tweets. Follow the tips guide, and click the “Select All” command. The final workflow should be like this:

Image for post

Octoparse workflow

The logic is simple: the scraper will first visit the page. Then it starts extracting the tweets until it finishes all the tweets inside the loop. It will repeat the scrolling action to locate another set of tweets and continue the extraction again until all the information is extracted successfully.

#coronavirus #web-scraping #sentiment-analysis #data analysis

Twitter Sentiment Analysis on Novel Coronavirus (COVID-19)
1.25 GEEK