If you haven’t read the entire series of these Apache Spark cost tuning articles, then the changes recommended in this summary may not make sense. To understand these steps, I encourage you to read Part 1 which provides the philosophy behind the strategy, and Part 2 which shows you how to determine the estimated costs for your Spark job. The full series is linked below, after this summary of the steps you should take:
Q: What executor config do you recommend for a cluster with 32 cores and 256GB?
A: Because 31 is a prime number, I actually recommend leaving 2 cores for YARN and system processing. That leaves 30 cores for available processing which means a 5 core executor with 34GB of memory will work for this node as well.
Q: What executor config do you recommend for clusters with nodes that have 8 or fewer cores?
A: I only recommend using 8 core (or less) nodes if your Spark jobs only run on a single node. If your jobs span two 8 core nodes (or four 4 core nodes) then your job would be better served running on a 16 core node.
#cost #efficiency #apache-spark #aws #software-engineering