Data Transformation in PySpark

Data is now growing faster than processing speeds. One of the many solutions to this problem is to parallelise our computing on large clusters. A language that allows us to do just that is PySpark.

However, PySpark requires you to think about data differently.

Instead of looking at a dataset row-wise. PySpark encourages you to look at it column-wise. This was a difficult transition for me at first. I’ll tell you the main tricks I learned so you don’t have to waste your time searching for the answers.

#data-transformation #introduction-to-pyspark #pyspark #data-science

towardsdatascience.com

Data Transformation in PySpark