The goal of this post is to dig a bit deeper into the internals of Apache Spark to get a better understanding of how Spark works under the hood, so we can write optimal code that maximizes parallelism and minimized data shuffles.
#hadoop #spark #big-data #data-science #developer