Decoding Public Sentiments Towards Energy Transition Using NLP

Connecting social sentiments towards the accelerating energy transition to enable data-driven policymaking.

Getting On Board With The Project

During new year 2020, I made a small vow to find opportunities to work on collaborative projects for data science, machine learning, and AI. I was always transfixed at the work Omdena had achieved in the past and was keeping the website in my bookmarked list for quite some time.

Being a student from an engineering background, I finally applied for the energy transition challenge and was met with a list of objectives along with an opportunity to communicate with some of the finest professionals I’ve ever met. A few days later I received my acceptance and thus began my return to a long untouched world. The challenge concerned the study of social sentiments for topics such as energy transition, sustainability, and the like which would help chart out better policies in the future.

What sets this challenge apart from the rest was the sheer scale of data collected, social channels scraped and data analyzed. The end result were findings so crucial and insightful to the primary objective that the team gained something new to learn and work with.

The Problem: Understanding Energy Transition

The energy transition is a process that needs a great deal of deliberation to figure out which technologies will be the best to satisfy future energy needs, but also how sustainable and environmentally friendly they are. But at the end of the day, it is a matter of public concern as taxpayers vote with their wallets over such issues. As a result, it becomes necessary to gauge public interests and formulate a clearer picture of their thoughts, complaints, and ideas. The challenge set by the WEC for the team was to accomplish this by collecting as many different sources of data and running the algorithms to know what makes the public tick.

Image for post

Image for post

How textual data appears-ripe and ready for processing.

Further discussions produced more defined questions: “What energies are people willing to support?”, “Do people have the same issues globally or are public energy problems dependent on the geography?”, “What are the common factors driving these sentiments?”, “Why are public perceptions too positive or negative in certain channels when compared to others?”, “What are the word associations that people are using when discussing these topics?”

I was primarily focused on scraping sources from Facebook and Reddit which were further analyzed for classifying unsupervised texts. An additional aspect of the analysis also looked at working on energy transition texts from websites, news sources, and all that could be collected. To paraphrase Galileo Galilei,

Measure what is measurable, and make measurable what is not so.

Making A Data Recipe

What’s buzzing among people? It’s data. Crude, multidimensional, and confusing-data sits as the invisible output from our everyday lives, which places it at an important position for conducting analyses.

Needless to say, the next step of this process was to write scripts to collect data from the Facebook Graph API and Reddit channels, dashing them with additional data from news sources, public channels, and whatever could be found. This is where even the most seasoned analyst runs across problems like data parsing, removing unnecessary verbiage, and cleaning documentation to make a useful corpus.

If the data that is fit for analysis is the ‘final dish’, the recipe will undoubtedly look like a mangled mess. Luckily, this is where some of the finer packages for RStudio and Python come in handy. Public Facebook pages for newspapers like ABC News, Fox News, and The Huffington Post, further helped in enriching the dataset.

