Eldred  Metz

Eldred Metz

1597560180

Part 3: Cost Efficient Executor Configuration for Apache Spark

CPUs per node

The first step to determine an efficient executor config is to figure out how many actual CPUs (i.e. not virtual CPUs) are available on the nodes in your cluster. To do so, you need to find out what type of EC2 instance your cluster is using. For our discussion here, we’ll be using r5.4xlarge which according to the AWS EC2 Instance Pricing page has 16 CPUs.

When we submit our jobs, we need to reserve one CPU for the operating system and the Cluster Manager. So we don’t want to use all 16 CPUs for the job, instead we want to allocate 15 available CPUs to use on each node during Spark processing.

16 CPUs, with 15 available for Spark and one allocated to OS and cluster manager

CPUs per executor

Now that we know how many CPUs are available to use on each node, we need to determine how many Spark cores we want to assign to each executor. From basic math (5 x 3= 15), we can see that there are four different executor & core combinations that can get us to 15 Spark cores per node:

4 combinations of Executors x cores/executor = 15, ie 1x15, 3x5, 5x3, 15x1

Possible configurations for executor

Lets explore the feasibility of each of these configurations.

#aws #efficiency #software-engineering #cost #apache-spark

What is GEEK

Buddha Community

Part 3: Cost Efficient Executor Configuration for Apache Spark
Eldred  Metz

Eldred Metz

1597560180

Part 3: Cost Efficient Executor Configuration for Apache Spark

CPUs per node

The first step to determine an efficient executor config is to figure out how many actual CPUs (i.e. not virtual CPUs) are available on the nodes in your cluster. To do so, you need to find out what type of EC2 instance your cluster is using. For our discussion here, we’ll be using r5.4xlarge which according to the AWS EC2 Instance Pricing page has 16 CPUs.

When we submit our jobs, we need to reserve one CPU for the operating system and the Cluster Manager. So we don’t want to use all 16 CPUs for the job, instead we want to allocate 15 available CPUs to use on each node during Spark processing.

16 CPUs, with 15 available for Spark and one allocated to OS and cluster manager

CPUs per executor

Now that we know how many CPUs are available to use on each node, we need to determine how many Spark cores we want to assign to each executor. From basic math (5 x 3= 15), we can see that there are four different executor & core combinations that can get us to 15 Spark cores per node:

4 combinations of Executors x cores/executor = 15, ie 1x15, 3x5, 5x3, 15x1

Possible configurations for executor

Lets explore the feasibility of each of these configurations.

#aws #efficiency #software-engineering #cost #apache-spark

Lenora  Hauck

Lenora Hauck

1598690940

How to Migrate Existing Apache Spark Jobs to Cost Efficient Executor Configurations

There are a number of things to keep in mind as you tune your Spark jobs. The following sections cover the most important ones.

Which jobs should be tuned first?

With many jobs to tune, you may be wondering which jobs to prioritize tuning first. Jobs that only have one or two cores per executor make a great candidate for conversion. Also, jobs that have 3000 or more Spark core minutes also make good candidates. I define a Spark core minute as…

Executor count * cores per executor * run time (in minutes) = Spark core minutes

Make sure you are comparing apples to apples

When converting an existing job to an efficient executor configuration, you will need to change your executor count whenever your executor core count changes. For example, if you change your executor cores from 2 to 5, then you will need to change your executor count to maintain the same number of Spark cores. For example…

Old configuration (100 Spark cores): num-executor=50 executor-cores=2

New configuration (100 Spark cores): num-executor=20 executor-cores=5

The reason to do this is so that you have the same processing power (i.e. Spark cores) with the new config as with the old config.

Run times may slow down

When converting jobs to a cost efficient executor, you may find that sometimes your process will slow down with the new config. When this happens for you, do not worry. Your new configuration is running cheaper and there are additional tuning steps you can take to improve performance in a cost efficient manner.

Here is an example of a job I converted from its original inefficient configuration to an efficient configuration, and then the steps I took afterward to improve the run time at a low cost. Note that while the initial step is to make execution cheaper by utilizing the same number of cores on fewer nodes, this makes the run time slower. Then I increased the number of nodes while keeping all cores fully utilized. Ultimately this leads to a quicker run time and a lower cost!

#cost #apache-spark #aws #efficiency #software-engineering-job

A Wrapper for Sembast and SQFlite to Enable Easy

FHIR_DB

This is really just a wrapper around Sembast_SQFLite - so all of the heavy lifting was done by Alex Tekartik. I highly recommend that if you have any questions about working with this package that you take a look at Sembast. He's also just a super nice guy, and even answered a question for me when I was deciding which sembast version to use. As usual, ResoCoder also has a good tutorial.

I have an interest in low-resource settings and thus a specific reason to be able to store data offline. To encourage this use, there are a number of other packages I have created based around the data format FHIR. FHIR® is the registered trademark of HL7 and is used with the permission of HL7. Use of the FHIR trademark does not constitute endorsement of this product by HL7.

Using the Db

So, while not absolutely necessary, I highly recommend that you use some sort of interface class. This adds the benefit of more easily handling errors, plus if you change to a different database in the future, you don't have to change the rest of your app, just the interface.

I've used something like this in my projects:

class IFhirDb {
  IFhirDb();
  final ResourceDao resourceDao = ResourceDao();

  Future<Either<DbFailure, Resource>> save(Resource resource) async {
    Resource resultResource;
    try {
      resultResource = await resourceDao.save(resource);
    } catch (error) {
      return left(DbFailure.unableToSave(error: error.toString()));
    }
    return right(resultResource);
  }

  Future<Either<DbFailure, List<Resource>>> returnListOfSingleResourceType(
      String resourceType) async {
    List<Resource> resultList;
    try {
      resultList =
          await resourceDao.getAllSortedById(resourceType: resourceType);
    } catch (error) {
      return left(DbFailure.unableToObtainList(error: error.toString()));
    }
    return right(resultList);
  }

  Future<Either<DbFailure, List<Resource>>> searchFunction(
      String resourceType, String searchString, String reference) async {
    List<Resource> resultList;
    try {
      resultList =
          await resourceDao.searchFor(resourceType, searchString, reference);
    } catch (error) {
      return left(DbFailure.unableToObtainList(error: error.toString()));
    }
    return right(resultList);
  }
}

I like this because in case there's an i/o error or something, it won't crash your app. Then, you can call this interface in your app like the following:

final patient = Patient(
    resourceType: 'Patient',
    name: [HumanName(text: 'New Patient Name')],
    birthDate: Date(DateTime.now()),
);

final saveResult = await IFhirDb().save(patient);

This will save your newly created patient to the locally embedded database.

IMPORTANT: this database will expect that all previously created resources have an id. When you save a resource, it will check to see if that resource type has already been stored. (Each resource type is saved in it's own store in the database). It will then check if there is an ID. If there's no ID, it will create a new one for that resource (along with metadata on version number and creation time). It will save it, and return the resource. If it already has an ID, it will copy the the old version of the resource into a _history store. It will then update the metadata of the new resource and save that version into the appropriate store for that resource. If, for instance, we have a previously created patient:

{
    "resourceType": "Patient",
    "id": "fhirfli-294057507-6811107",
    "meta": {
        "versionId": "1",
        "lastUpdated": "2020-10-16T19:41:28.054369Z"
    },
    "name": [
        {
            "given": ["New"],
            "family": "Patient"
        }
    ],
    "birthDate": "2020-10-16"
}

And we update the last name to 'Provider'. The above version of the patient will be kept in _history, while in the 'Patient' store in the db, we will have the updated version:

{
    "resourceType": "Patient",
    "id": "fhirfli-294057507-6811107",
    "meta": {
        "versionId": "2",
        "lastUpdated": "2020-10-16T19:45:07.316698Z"
    },
    "name": [
        {
            "given": ["New"],
            "family": "Provider"
        }
    ],
    "birthDate": "2020-10-16"
}

This way we can keep track of all previous version of all resources (which is obviously important in medicine).

For most of the interactions (saving, deleting, etc), they work the way you'd expect. The only difference is search. Because Sembast is NoSQL, we can search on any of the fields in a resource. If in our interface class, we have the following function:

  Future<Either<DbFailure, List<Resource>>> searchFunction(
      String resourceType, String searchString, String reference) async {
    List<Resource> resultList;
    try {
      resultList =
          await resourceDao.searchFor(resourceType, searchString, reference);
    } catch (error) {
      return left(DbFailure.unableToObtainList(error: error.toString()));
    }
    return right(resultList);
  }

You can search for all immunizations of a certain patient:

searchFunction(
        'Immunization', 'patient.reference', 'Patient/$patientId');

This function will search through all entries in the 'Immunization' store. It will look at all 'patient.reference' fields, and return any that match 'Patient/$patientId'.

The last thing I'll mention is that this is a password protected db, using AES-256 encryption (although it can also use Salsa20). Anytime you use the db, you have the option of using a password for encryption/decryption. Remember, if you setup the database using encryption, you will only be able to access it using that same password. When you're ready to change the password, you will need to call the update password function. If we again assume we created a change password method in our interface, it might look something like this:

class IFhirDb {
  IFhirDb();
  final ResourceDao resourceDao = ResourceDao();
  ...
    Future<Either<DbFailure, Unit>> updatePassword(String oldPassword, String newPassword) async {
    try {
      await resourceDao.updatePw(oldPassword, newPassword);
    } catch (error) {
      return left(DbFailure.unableToUpdatePassword(error: error.toString()));
    }
    return right(Unit);
  }

You don't have to use a password, and in that case, it will save the db file as plain text. If you want to add a password later, it will encrypt it at that time.

General Store

After using this for a while in an app, I've realized that it needs to be able to store data apart from just FHIR resources, at least on occasion. For this, I've added a second class for all versions of the database called GeneralDao. This is similar to the ResourceDao, but fewer options. So, in order to save something, it would look like this:

await GeneralDao().save('password', {'new':'map'});
await GeneralDao().save('password', {'new':'map'}, 'key');

The difference between these two options is that the first one will generate a key for the map being stored, while the second will store the map using the key provided. Both will return the key after successfully storing the map.

Other functions available include:

// deletes everything in the general store
await GeneralDao().deleteAllGeneral('password'); 

// delete specific entry
await GeneralDao().delete('password','key'); 

// returns map with that key
await GeneralDao().find('password', 'key'); 

FHIR® is a registered trademark of Health Level Seven International (HL7) and its use does not constitute an endorsement of products by HL7®

Use this package as a library

Depend on it

Run this command:

With Flutter:

 $ flutter pub add fhir_db

This will add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):

dependencies:
  fhir_db: ^0.4.3

Alternatively, your editor might support or flutter pub get. Check the docs for your editor to learn more.

Import it

Now in your Dart code, you can use:

import 'package:fhir_db/dstu2.dart';
import 'package:fhir_db/dstu2/fhir_db.dart';
import 'package:fhir_db/dstu2/general_dao.dart';
import 'package:fhir_db/dstu2/resource_dao.dart';
import 'package:fhir_db/encrypt/aes.dart';
import 'package:fhir_db/encrypt/salsa.dart';
import 'package:fhir_db/r4.dart';
import 'package:fhir_db/r4/fhir_db.dart';
import 'package:fhir_db/r4/general_dao.dart';
import 'package:fhir_db/r4/resource_dao.dart';
import 'package:fhir_db/r5.dart';
import 'package:fhir_db/r5/fhir_db.dart';
import 'package:fhir_db/r5/general_dao.dart';
import 'package:fhir_db/r5/resource_dao.dart';
import 'package:fhir_db/stu3.dart';
import 'package:fhir_db/stu3/fhir_db.dart';
import 'package:fhir_db/stu3/general_dao.dart';
import 'package:fhir_db/stu3/resource_dao.dart'; 

example/lib/main.dart

import 'package:fhir/r4.dart';
import 'package:fhir_db/r4.dart';
import 'package:flutter/material.dart';
import 'package:test/test.dart';

Future<void> main() async {
  WidgetsFlutterBinding.ensureInitialized();

  final resourceDao = ResourceDao();

  // await resourceDao.updatePw('newPw', null);
  await resourceDao.deleteAllResources(null);

  group('Playing with passwords', () {
    test('Playing with Passwords', () async {
      final patient = Patient(id: Id('1'));

      final saved = await resourceDao.save(null, patient);

      await resourceDao.updatePw(null, 'newPw');
      final search1 = await resourceDao.find('newPw',
          resourceType: R4ResourceType.Patient, id: Id('1'));
      expect(saved, search1[0]);

      await resourceDao.updatePw('newPw', 'newerPw');
      final search2 = await resourceDao.find('newerPw',
          resourceType: R4ResourceType.Patient, id: Id('1'));
      expect(saved, search2[0]);

      await resourceDao.updatePw('newerPw', null);
      final search3 = await resourceDao.find(null,
          resourceType: R4ResourceType.Patient, id: Id('1'));
      expect(saved, search3[0]);

      await resourceDao.deleteAllResources(null);
    });
  });

  final id = Id('12345');
  group('Saving Things:', () {
    test('Save Patient', () async {
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);
      final patient = Patient(id: id, name: [humanName]);
      final saved = await resourceDao.save(null, patient);

      expect(saved.id, id);

      expect((saved as Patient).name?[0], humanName);
    });

    test('Save Organization', () async {
      final organization = Organization(id: id, name: 'FhirFli');
      final saved = await resourceDao.save(null, organization);

      expect(saved.id, id);

      expect((saved as Organization).name, 'FhirFli');
    });

    test('Save Observation1', () async {
      final observation1 = Observation(
        id: Id('obs1'),
        code: CodeableConcept(text: 'Observation #1'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save(null, observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1');
    });

    test('Save Observation1 Again', () async {
      final observation1 = Observation(
          id: Id('obs1'),
          code: CodeableConcept(text: 'Observation #1 - Updated'));
      final saved = await resourceDao.save(null, observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1 - Updated');

      expect(saved.meta?.versionId, Id('2'));
    });

    test('Save Observation2', () async {
      final observation2 = Observation(
        id: Id('obs2'),
        code: CodeableConcept(text: 'Observation #2'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save(null, observation2);

      expect(saved.id, Id('obs2'));

      expect((saved as Observation).code.text, 'Observation #2');
    });

    test('Save Observation3', () async {
      final observation3 = Observation(
        id: Id('obs3'),
        code: CodeableConcept(text: 'Observation #3'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save(null, observation3);

      expect(saved.id, Id('obs3'));

      expect((saved as Observation).code.text, 'Observation #3');
    });
  });

  group('Finding Things:', () {
    test('Find 1st Patient', () async {
      final search = await resourceDao.find(null,
          resourceType: R4ResourceType.Patient, id: id);
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);

      expect(search.length, 1);

      expect((search[0] as Patient).name?[0], humanName);
    });

    test('Find 3rd Observation', () async {
      final search = await resourceDao.find(null,
          resourceType: R4ResourceType.Observation, id: Id('obs3'));

      expect(search.length, 1);

      expect(search[0].id, Id('obs3'));

      expect((search[0] as Observation).code.text, 'Observation #3');
    });

    test('Find All Observations', () async {
      final search = await resourceDao.getResourceType(
        null,
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 3);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), true);

      expect(idList.contains('obs3'), true);
    });

    test('Find All (non-historical) Resources', () async {
      final search = await resourceDao.getAll(null);

      expect(search.length, 5);
      final patList = search.toList();
      final orgList = search.toList();
      final obsList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);
      obsList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Observation);

      expect(patList.length, 1);

      expect(orgList.length, 1);

      expect(obsList.length, 3);
    });
  });

  group('Deleting Things:', () {
    test('Delete 2nd Observation', () async {
      await resourceDao.delete(
          null, null, R4ResourceType.Observation, Id('obs2'), null, null);

      final search = await resourceDao.getResourceType(
        null,
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 2);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), false);

      expect(idList.contains('obs3'), true);
    });

    test('Delete All Observations', () async {
      await resourceDao.deleteSingleType(null,
          resourceType: R4ResourceType.Observation);

      final search = await resourceDao.getAll(null);

      expect(search.length, 2);

      final patList = search.toList();
      final orgList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);

      expect(patList.length, 1);

      expect(patList.length, 1);
    });

    test('Delete All Resources', () async {
      await resourceDao.deleteAllResources(null);

      final search = await resourceDao.getAll(null);

      expect(search.length, 0);
    });
  });

  group('Password - Saving Things:', () {
    test('Save Patient', () async {
      await resourceDao.updatePw(null, 'newPw');
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);
      final patient = Patient(id: id, name: [humanName]);
      final saved = await resourceDao.save('newPw', patient);

      expect(saved.id, id);

      expect((saved as Patient).name?[0], humanName);
    });

    test('Save Organization', () async {
      final organization = Organization(id: id, name: 'FhirFli');
      final saved = await resourceDao.save('newPw', organization);

      expect(saved.id, id);

      expect((saved as Organization).name, 'FhirFli');
    });

    test('Save Observation1', () async {
      final observation1 = Observation(
        id: Id('obs1'),
        code: CodeableConcept(text: 'Observation #1'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save('newPw', observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1');
    });

    test('Save Observation1 Again', () async {
      final observation1 = Observation(
          id: Id('obs1'),
          code: CodeableConcept(text: 'Observation #1 - Updated'));
      final saved = await resourceDao.save('newPw', observation1);

      expect(saved.id, Id('obs1'));

      expect((saved as Observation).code.text, 'Observation #1 - Updated');

      expect(saved.meta?.versionId, Id('2'));
    });

    test('Save Observation2', () async {
      final observation2 = Observation(
        id: Id('obs2'),
        code: CodeableConcept(text: 'Observation #2'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save('newPw', observation2);

      expect(saved.id, Id('obs2'));

      expect((saved as Observation).code.text, 'Observation #2');
    });

    test('Save Observation3', () async {
      final observation3 = Observation(
        id: Id('obs3'),
        code: CodeableConcept(text: 'Observation #3'),
        effectiveDateTime: FhirDateTime(DateTime(1981, 09, 18)),
      );
      final saved = await resourceDao.save('newPw', observation3);

      expect(saved.id, Id('obs3'));

      expect((saved as Observation).code.text, 'Observation #3');
    });
  });

  group('Password - Finding Things:', () {
    test('Find 1st Patient', () async {
      final search = await resourceDao.find('newPw',
          resourceType: R4ResourceType.Patient, id: id);
      final humanName = HumanName(family: 'Atreides', given: ['Duke']);

      expect(search.length, 1);

      expect((search[0] as Patient).name?[0], humanName);
    });

    test('Find 3rd Observation', () async {
      final search = await resourceDao.find('newPw',
          resourceType: R4ResourceType.Observation, id: Id('obs3'));

      expect(search.length, 1);

      expect(search[0].id, Id('obs3'));

      expect((search[0] as Observation).code.text, 'Observation #3');
    });

    test('Find All Observations', () async {
      final search = await resourceDao.getResourceType(
        'newPw',
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 3);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), true);

      expect(idList.contains('obs3'), true);
    });

    test('Find All (non-historical) Resources', () async {
      final search = await resourceDao.getAll('newPw');

      expect(search.length, 5);
      final patList = search.toList();
      final orgList = search.toList();
      final obsList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);
      obsList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Observation);

      expect(patList.length, 1);

      expect(orgList.length, 1);

      expect(obsList.length, 3);
    });
  });

  group('Password - Deleting Things:', () {
    test('Delete 2nd Observation', () async {
      await resourceDao.delete(
          'newPw', null, R4ResourceType.Observation, Id('obs2'), null, null);

      final search = await resourceDao.getResourceType(
        'newPw',
        resourceTypes: [R4ResourceType.Observation],
      );

      expect(search.length, 2);

      final idList = [];
      for (final obs in search) {
        idList.add(obs.id.toString());
      }

      expect(idList.contains('obs1'), true);

      expect(idList.contains('obs2'), false);

      expect(idList.contains('obs3'), true);
    });

    test('Delete All Observations', () async {
      await resourceDao.deleteSingleType('newPw',
          resourceType: R4ResourceType.Observation);

      final search = await resourceDao.getAll('newPw');

      expect(search.length, 2);

      final patList = search.toList();
      final orgList = search.toList();
      patList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Patient);
      orgList.retainWhere(
          (resource) => resource.resourceType == R4ResourceType.Organization);

      expect(patList.length, 1);

      expect(patList.length, 1);
    });

    test('Delete All Resources', () async {
      await resourceDao.deleteAllResources('newPw');

      final search = await resourceDao.getAll('newPw');

      expect(search.length, 0);

      await resourceDao.updatePw('newPw', null);
    });
  });
} 

Download Details:

Author: MayJuun

Source Code: https://github.com/MayJuun/fhir/tree/main/fhir_db

#sqflite  #dart  #flutter 

Anil  Sakhiya

Anil Sakhiya

1595141479

Apache Spark For Beginners In 3 Hours | Apache Spark Training

In this Apache Spark For Beginners, we will have an overview of Spark in Big Data. We will start with an introduction to Apache Spark Programming. Then we will move to know the Spark History. Moreover, we will learn why Spark is needed and covers everything that an individual needed to master its skill in this field. In this Apache Spark tutorial, you will not only learn Spark from the basics but also through this Apache Spark tutorial, you will get to know the Spark architecture and its components such as Spark Core, Spark Programming, Spark SQL, Spark Streaming, and much more.

This “Spark Tutorial” will help you to comprehensively learn all the concepts of Apache Spark. Apache Spark has a bright future. Many companies have recognized the power of Spark and quickly started worked on it. The primary importance of Apache Spark in the Big data industry is because of its in-memory data processing. Spark can also handle many analytics challenges because of its low-latency in-memory data processing capability.

Spark’s shell provides you a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in either Scala (which runs on the Java VM and is thus a good way to use existing Java libraries) or Python

This Spark tutorial will comprise of the following topics:

  • 00:00:00 - Introduction
  • 00:00:52 - Spark Fundamentals
  • 00:23:11 - Spark Architecture
  • 01:01:08 - Spark Demo

#apache-spark #apache #spark #big-data #developer

Mya  Lynch

Mya Lynch

1598239440

Part 6: Summary of Apache Spark Cost Tuning Strategy

If you haven’t read the entire series of these Apache Spark cost tuning articles, then the changes recommended in this summary may not make sense. To understand these steps, I encourage you to read Part 1 which provides the philosophy behind the strategy, and Part 2 which shows you how to determine the estimated costs for your Spark job. The full series is linked below, after this summary of the steps you should take:

  1. Switch executor core count to the ideal core count for your node as described in Part 3.
  2. If executor core count changed then adjust executor count using the method described in Part 4.
  3. Change executor memory to the efficient memory size for your node as described in Part 3.
  4. When executor memory issues happen while running with the new config, add tweaks that resolve memory issues as described in Part 5.
  5. If job is running with 100% CPU utilization and 100% memory utilization, consider running the job on a node with more memory per node CPU as described in Part 4.
  6. If the run time slows down after tuning and you want to sacrifice some cost savings for run time improvement, follow the method described in Part 4 to improve run time.

Image for post

Q: What executor config do you recommend for a cluster with 32 cores and 256GB?

A: Because 31 is a prime number, I actually recommend leaving 2 cores for YARN and system processing. That leaves 30 cores for available processing which means a 5 core executor with 34GB of memory will work for this node as well.

Q: What executor config do you recommend for clusters with nodes that have 8 or fewer cores?

A: I only recommend using 8 core (or less) nodes if your Spark jobs only run on a single node. If your jobs span two 8 core nodes (or four 4 core nodes) then your job would be better served running on a 16 core node.

#cost #efficiency #apache-spark #aws #software-engineering