Learning PySpark Locally Before Moving to Multi-node Cluster Databricks Environment
I have come across several frustrating tutorials on PySpark promising to teach me PySpark in under five minutes 🙄. They are click baits and lack the necessary depth to get me started and keep me rolling. So, I decided to write an article in hopes of helping others like myself with a project-driven tutorial as opposed to showing you code snippets and know-hows. I will primarily focus on a list of problems and use PySpark to answer the questions. You may follow along by grabbing the dataset and code here. At the end of this article, I have also included excellent resources I enjoyed learning from. Happy Learning!
#big-data #sql #python #pyspark #developer