There are a number of things to keep in mind as you tune your Spark jobs. The following sections cover the most important ones.

Which jobs should be tuned first?

With many jobs to tune, you may be wondering which jobs to prioritize tuning first. Jobs that only have one or two cores per executor make a great candidate for conversion. Also, jobs that have 3000 or more Spark core minutes also make good candidates. I define a Spark core minute as…

Executor count * cores per executor * run time (in minutes) = Spark core minutes

Make sure you are comparing apples to apples

When converting an existing job to an efficient executor configuration, you will need to change your executor count whenever your executor core count changes. For example, if you change your executor cores from 2 to 5, then you will need to change your executor count to maintain the same number of Spark cores. For example…

Old configuration (100 Spark cores): num-executor=50 executor-cores=2

New configuration (100 Spark cores): num-executor=20 executor-cores=5

The reason to do this is so that you have the same processing power (i.e. Spark cores) with the new config as with the old config.

Run times may slow down

When converting jobs to a cost efficient executor, you may find that sometimes your process will slow down with the new config. When this happens for you, do not worry. Your new configuration is running cheaper and there are additional tuning steps you can take to improve performance in a cost efficient manner.

Here is an example of a job I converted from its original inefficient configuration to an efficient configuration, and then the steps I took afterward to improve the run time at a low cost. Note that while the initial step is to make execution cheaper by utilizing the same number of cores on fewer nodes, this makes the run time slower. Then I increased the number of nodes while keeping all cores fully utilized. Ultimately this leads to a quicker run time and a lower cost!

#cost #apache-spark #aws #efficiency #software-engineering-job

How to Migrate Existing Apache Spark Jobs to Cost Efficient Executor Configurations
1.45 GEEK