The step by step overview of the cost tuning strategy.If you haven’t read the entire series of these Apache Spark cost tuning articles, then the changes recommended in this summary may not make sense. To understand these steps, I encourage you to read Part 1 which provides the philosophy behind the strategy, and Part 2 which shows you how to determine the estimated costs for your Spark job.
If you haven’t read the entire series of these Apache Spark cost tuning articles, then the changes recommended in this summary may not make sense. To understand these steps, I encourage you to read Part 1 which provides the philosophy behind the strategy, and Part 2 which shows you how to determine the estimated costs for your Spark job. The full series is linked below, after this summary of the steps you should take:
Switch executor core count to the ideal core count for your node as described in Part 3.
If executor core count changed then adjust executor count using the method described in Part 4.
Change executor memory to the efficient memory size for your node as described in Part 3.
When executor memory issues happen while running with the new config, add tweaks that resolve memory issues as described in Part 5.
If job is running with 100% CPU utilization and 100% memory utilization, consider running the job on a node with more memory per node CPU as described in Part 4.
If the run time slows down after tuning and you want to sacrifice some cost savings for run time improvement, follow the method described in Part 4 to improve run time.
Q: What executor config do you recommend for a cluster with 32 cores and 256GB?
A: Because 31 is a prime number, I actually recommend leaving 2 cores for YARN and system processing. That leaves 30 cores for available processing which means a 5 core executor with 34GB of memory will work for this node as well.
Q: What executor config do you recommend for clusters with nodes that have 8 or fewer cores?
A: I only recommend using 8 core (or less) nodes if your Spark jobs only run on a single node. If your jobs span two 8 core nodes (or four 4 core nodes) then your job would be better served running on a 16 core node.
Learn more about AWS Cost Allocation Tags, tag examples via the web, CLI, and node.js, explore tag compliance services, and review tagging strategies.
Find the most efficient executor configuration for your node.The first step to determine an efficient executor config is to figure out how many actual CPUs (i.e. not virtual CPUs) are available on the nodes in your cluster. To do so, you need to find out what type of EC2 instance your cluster is using. For our discussion here, we’ll be using r5.4xlarge which according to the AWS EC2 Instance Pricing page has 16 CPUs.
Steps to follow when converting existing jobs to cost efficient config. There are a number of things to keep in mind as you tune your Spark jobs. The following sections cover the most important ones.
How I saved 60% of costs in an Apache Spark job, with no increase in job time and no decrease in data processed.Until recently, most companies didn’t care how much they spent on their cloud resources. But in a covid-19 world, companies like Expedia Group™ are reducing cloud spending where reasonable. While many Apache Spark tuning guides discuss how to get the best performance using Spark, none of them ever discuss the cost of that performance.
I outline the procedure for working through cost tuning.Below is a screenshot highlighting some jobs at Expedia Group™ that were cost tuned using the principles in this guide. I want to stress that no code changes were involved, only the spark submit parameters were changed during the cost tuning process. Pay close attention to the Node utilization column that is highlighted in yellow.