The goal of this post is to dig a bit deeper into the internals of Apache Spark to get a better understanding of how Spark works under the hood, so we can write optimal code that maximizes parallelism and minimized data shuffles.

#hadoop #spark #big-data #data-science #developer

Apache Spark Internals: Tips and Optimizations
2.25 GEEK