The climate change currently is a hot topic, with many experts claiming a small but still significant increase of the average temperature over the whole world. Not all people believe these experts and claim that the climate didn’t change, while other people question the influence of the human species on the current development.

While I am by no means an expert for climate or weather, I was wondering if I could follow the claims of an increase of the average temperature by analyzing appropriate data. Depending on the chosen data source, following this idea can be technically challenging and an insightful journey into weather data. In this article, I want to present my approach of using PySpark for analyzing ca 100GB of compressed raw weather data for reconstructing some images substantiating the climate change.

Many details of processing steps are omitted in this article to keep focus on the general approach. You can find a Jupyter notebook containing the complete working code on GitHub.

#pyspark #big-data #climate-change #weather #python

Investigating the Climate Change with Python and Spark
2.20 GEEK