Kevin Mason

1550740135

How to set Apache Spark Executor memory

How can I increase the memory available for Apache spark executor nodes?

I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.

When I try count the lines of the file after setting the file to be cached in memory I get these errors:

2014-10-25 22:25:12 WARN  CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.

I looked at the documentation here and set spark.executor.memory to 4g in $SPARK_HOME/conf/spark-defaults.conf

The UI shows this variable is set in the Spark Environment. You can find screenshot here

However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.

I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.

I am running my code interactively from the spark-shell

#apache-spark

What is GEEK

Buddha Community

Hermann  Frami

Hermann Frami

1651383480

A Simple Wrapper Around Amplify AppSync Simulator

This serverless plugin is a wrapper for amplify-appsync-simulator made for testing AppSync APIs built with serverless-appsync-plugin.

Install

npm install serverless-appsync-simulator
# or
yarn add serverless-appsync-simulator

Usage

This plugin relies on your serverless yml file and on the serverless-offline plugin.

plugins:
  - serverless-dynamodb-local # only if you need dynamodb resolvers and you don't have an external dynamodb
  - serverless-appsync-simulator
  - serverless-offline

Note: Order is important serverless-appsync-simulator must go before serverless-offline

To start the simulator, run the following command:

sls offline start

You should see in the logs something like:

...
Serverless: AppSync endpoint: http://localhost:20002/graphql
Serverless: GraphiQl: http://localhost:20002
...

Configuration

Put options under custom.appsync-simulator in your serverless.yml file

| option | default | description | | ------------------------ | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | | apiKey | 0123456789 | When using API_KEY as authentication type, the key to authenticate to the endpoint. | | port | 20002 | AppSync operations port; if using multiple APIs, the value of this option will be used as a starting point, and each other API will have a port of lastPort + 10 (e.g. 20002, 20012, 20022, etc.) | | wsPort | 20003 | AppSync subscriptions port; if using multiple APIs, the value of this option will be used as a starting point, and each other API will have a port of lastPort + 10 (e.g. 20003, 20013, 20023, etc.) | | location | . (base directory) | Location of the lambda functions handlers. | | refMap | {} | A mapping of resource resolutions for the Ref function | | getAttMap | {} | A mapping of resource resolutions for the GetAtt function | | importValueMap | {} | A mapping of resource resolutions for the ImportValue function | | functions | {} | A mapping of external functions for providing invoke url for external fucntions | | dynamoDb.endpoint | http://localhost:8000 | Dynamodb endpoint. Specify it if you're not using serverless-dynamodb-local. Otherwise, port is taken from dynamodb-local conf | | dynamoDb.region | localhost | Dynamodb region. Specify it if you're connecting to a remote Dynamodb intance. | | dynamoDb.accessKeyId | DEFAULT_ACCESS_KEY | AWS Access Key ID to access DynamoDB | | dynamoDb.secretAccessKey | DEFAULT_SECRET | AWS Secret Key to access DynamoDB | | dynamoDb.sessionToken | DEFAULT_ACCESS_TOKEEN | AWS Session Token to access DynamoDB, only if you have temporary security credentials configured on AWS | | dynamoDb.* | | You can add every configuration accepted by DynamoDB SDK | | rds.dbName | | Name of the database | | rds.dbHost | | Database host | | rds.dbDialect | | Database dialect. Possible values (mysql | postgres) | | rds.dbUsername | | Database username | | rds.dbPassword | | Database password | | rds.dbPort | | Database port | | watch | - *.graphql
- *.vtl | Array of glob patterns to watch for hot-reloading. |

Example:

custom:
  appsync-simulator:
    location: '.webpack/service' # use webpack build directory
    dynamoDb:
      endpoint: 'http://my-custom-dynamo:8000'

Hot-reloading

By default, the simulator will hot-relad when changes to *.graphql or *.vtl files are detected. Changes to *.yml files are not supported (yet? - this is a Serverless Framework limitation). You will need to restart the simulator each time you change yml files.

Hot-reloading relies on watchman. Make sure it is installed on your system.

You can change the files being watched with the watch option, which is then passed to watchman as the match expression.

e.g.

custom:
  appsync-simulator:
    watch:
      - ["match", "handlers/**/*.vtl", "wholename"] # => array is interpreted as the literal match expression
      - "*.graphql"                                 # => string like this is equivalent to `["match", "*.graphql"]`

Or you can opt-out by leaving an empty array or set the option to false

Note: Functions should not require hot-reloading, unless you are using a transpiler or a bundler (such as webpack, babel or typescript), un which case you should delegate hot-reloading to that instead.

Resource CloudFormation functions resolution

This plugin supports some resources resolution from the Ref, Fn::GetAtt and Fn::ImportValue functions in your yaml file. It also supports some other Cfn functions such as Fn::Join, Fb::Sub, etc.

Note: Under the hood, this features relies on the cfn-resolver-lib package. For more info on supported cfn functions, refer to the documentation

Basic usage

You can reference resources in your functions' environment variables (that will be accessible from your lambda functions) or datasource definitions. The plugin will automatically resolve them for you.

provider:
  environment:
    BUCKET_NAME:
      Ref: MyBucket # resolves to `my-bucket-name`

resources:
  Resources:
    MyDbTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: myTable
      ...
    MyBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: my-bucket-name
    ...

# in your appsync config
dataSources:
  - type: AMAZON_DYNAMODB
    name: dynamosource
    config:
      tableName:
        Ref: MyDbTable # resolves to `myTable`

Override (or mock) values

Sometimes, some references cannot be resolved, as they come from an Output from Cloudformation; or you might want to use mocked values in your local environment.

In those cases, you can define (or override) those values using the refMap, getAttMap and importValueMap options.

  • refMap takes a mapping of resource name to value pairs
  • getAttMap takes a mapping of resource name to attribute/values pairs
  • importValueMap takes a mapping of import name to values pairs

Example:

custom:
  appsync-simulator:
    refMap:
      # Override `MyDbTable` resolution from the previous example.
      MyDbTable: 'mock-myTable'
    getAttMap:
      # define ElasticSearchInstance DomainName
      ElasticSearchInstance:
        DomainEndpoint: 'localhost:9200'
    importValueMap:
      other-service-api-url: 'https://other.api.url.com/graphql'

# in your appsync config
dataSources:
  - type: AMAZON_ELASTICSEARCH
    name: elasticsource
    config:
      # endpoint resolves as 'http://localhost:9200'
      endpoint:
        Fn::Join:
          - ''
          - - https://
            - Fn::GetAtt:
                - ElasticSearchInstance
                - DomainEndpoint

Key-value mock notation

In some special cases you will need to use key-value mock nottation. Good example can be case when you need to include serverless stage value (${self:provider.stage}) in the import name.

This notation can be used with all mocks - refMap, getAttMap and importValueMap

provider:
  environment:
    FINISH_ACTIVITY_FUNCTION_ARN:
      Fn::ImportValue: other-service-api-${self:provider.stage}-url

custom:
  serverless-appsync-simulator:
    importValueMap:
      - key: other-service-api-${self:provider.stage}-url
        value: 'https://other.api.url.com/graphql'

Limitations

This plugin only tries to resolve the following parts of the yml tree:

  • provider.environment
  • functions[*].environment
  • custom.appSync

If you have the need of resolving others, feel free to open an issue and explain your use case.

For now, the supported resources to be automatically resovled by Ref: are:

  • DynamoDb tables
  • S3 Buckets

Feel free to open a PR or an issue to extend them as well.

External functions

When a function is not defined withing the current serverless file you can still call it by providing an invoke url which should point to a REST method. Make sure you specify "get" or "post" for the method. Default is "get", but you probably want "post".

custom:
  appsync-simulator:
    functions:
      addUser:
        url: http://localhost:3016/2015-03-31/functions/addUser/invocations
        method: post
      addPost:
        url: https://jsonplaceholder.typicode.com/posts
        method: post

Supported Resolver types

This plugin supports resolvers implemented by amplify-appsync-simulator, as well as custom resolvers.

From Aws Amplify:

  • NONE
  • AWS_LAMBDA
  • AMAZON_DYNAMODB
  • PIPELINE

Implemented by this plugin

  • AMAZON_ELASTIC_SEARCH
  • HTTP
  • RELATIONAL_DATABASE

Relational Database

Sample VTL for a create mutation

#set( $cols = [] )
#set( $vals = [] )
#foreach( $entry in $ctx.args.input.keySet() )
  #set( $regex = "([a-z])([A-Z]+)")
  #set( $replacement = "$1_$2")
  #set( $toSnake = $entry.replaceAll($regex, $replacement).toLowerCase() )
  #set( $discard = $cols.add("$toSnake") )
  #if( $util.isBoolean($ctx.args.input[$entry]) )
      #if( $ctx.args.input[$entry] )
        #set( $discard = $vals.add("1") )
      #else
        #set( $discard = $vals.add("0") )
      #end
  #else
      #set( $discard = $vals.add("'$ctx.args.input[$entry]'") )
  #end
#end
#set( $valStr = $vals.toString().replace("[","(").replace("]",")") )
#set( $colStr = $cols.toString().replace("[","(").replace("]",")") )
#if ( $valStr.substring(0, 1) != '(' )
  #set( $valStr = "($valStr)" )
#end
#if ( $colStr.substring(0, 1) != '(' )
  #set( $colStr = "($colStr)" )
#end
{
  "version": "2018-05-29",
  "statements":   ["INSERT INTO <name-of-table> $colStr VALUES $valStr", "SELECT * FROM    <name-of-table> ORDER BY id DESC LIMIT 1"]
}

Sample VTL for an update mutation

#set( $update = "" )
#set( $equals = "=" )
#foreach( $entry in $ctx.args.input.keySet() )
  #set( $cur = $ctx.args.input[$entry] )
  #set( $regex = "([a-z])([A-Z]+)")
  #set( $replacement = "$1_$2")
  #set( $toSnake = $entry.replaceAll($regex, $replacement).toLowerCase() )
  #if( $util.isBoolean($cur) )
      #if( $cur )
        #set ( $cur = "1" )
      #else
        #set ( $cur = "0" )
      #end
  #end
  #if ( $util.isNullOrEmpty($update) )
      #set($update = "$toSnake$equals'$cur'" )
  #else
      #set($update = "$update,$toSnake$equals'$cur'" )
  #end
#end
{
  "version": "2018-05-29",
  "statements":   ["UPDATE <name-of-table> SET $update WHERE id=$ctx.args.input.id", "SELECT * FROM <name-of-table> WHERE id=$ctx.args.input.id"]
}

Sample resolver for delete mutation

{
  "version": "2018-05-29",
  "statements":   ["UPDATE <name-of-table> set deleted_at=NOW() WHERE id=$ctx.args.id", "SELECT * FROM <name-of-table> WHERE id=$ctx.args.id"]
}

Sample mutation response VTL with support for handling AWSDateTime

#set ( $index = -1)
#set ( $result = $util.parseJson($ctx.result) )
#set ( $meta = $result.sqlStatementResults[1].columnMetadata)
#foreach ($column in $meta)
    #set ($index = $index + 1)
    #if ( $column["typeName"] == "timestamptz" )
        #set ($time = $result["sqlStatementResults"][1]["records"][0][$index]["stringValue"] )
        #set ( $nowEpochMillis = $util.time.parseFormattedToEpochMilliSeconds("$time.substring(0,19)+0000", "yyyy-MM-dd HH:mm:ssZ") )
        #set ( $isoDateTime = $util.time.epochMilliSecondsToISO8601($nowEpochMillis) )
        $util.qr( $result["sqlStatementResults"][1]["records"][0][$index].put("stringValue", "$isoDateTime") )
    #end
#end
#set ( $res = $util.parseJson($util.rds.toJsonString($util.toJson($result)))[1][0] )
#set ( $response = {} )
#foreach($mapKey in $res.keySet())
    #set ( $s = $mapKey.split("_") )
    #set ( $camelCase="" )
    #set ( $isFirst=true )
    #foreach($entry in $s)
        #if ( $isFirst )
          #set ( $first = $entry.substring(0,1) )
        #else
          #set ( $first = $entry.substring(0,1).toUpperCase() )
        #end
        #set ( $isFirst=false )
        #set ( $stringLength = $entry.length() )
        #set ( $remaining = $entry.substring(1, $stringLength) )
        #set ( $camelCase = "$camelCase$first$remaining" )
    #end
    $util.qr( $response.put("$camelCase", $res[$mapKey]) )
#end
$utils.toJson($response)

Using Variable Map

Variable map support is limited and does not differentiate numbers and strings data types, please inject them directly if needed.

Will be escaped properly: null, true, and false values.

{
  "version": "2018-05-29",
  "statements":   [
    "UPDATE <name-of-table> set deleted_at=NOW() WHERE id=:ID",
    "SELECT * FROM <name-of-table> WHERE id=:ID and unix_timestamp > $ctx.args.newerThan"
  ],
  variableMap: {
    ":ID": $ctx.args.id,
##    ":TIMESTAMP": $ctx.args.newerThan -- This will be handled as a string!!!
  }
}

Requires

Author: Serverless-appsync
Source Code: https://github.com/serverless-appsync/serverless-appsync-simulator 
License: MIT License

#serverless #sync #graphql 

Edureka Fan

Edureka Fan

1606982795

What is Apache Spark? | Apache Spark Python | Spark Training

This Edureka “What is Apache Spark?” video will help you to understand the Architecture of Spark in depth. It includes an example where we Understand what is Python and Apache Spark.

#big-data #apache-spark #developer #apache #spark

Hermann  Frami

Hermann Frami

1651319520

Serverless APIGateway Service Proxy

Serverless APIGateway Service Proxy

This Serverless Framework plugin supports the AWS service proxy integration feature of API Gateway. You can directly connect API Gateway to AWS services without Lambda.

Install

Run serverless plugin install in your Serverless project.

serverless plugin install -n serverless-apigateway-service-proxy

Supported AWS services

Here is a services list which this plugin supports for now. But will expand to other services in the feature. Please pull request if you are intersted in it.

  • Kinesis Streams
  • SQS
  • S3
  • SNS
  • DynamoDB
  • EventBridge

How to use

Define settings of the AWS services you want to integrate under custom > apiGatewayServiceProxies and run serverless deploy.

Kinesis

Sample syntax for Kinesis proxy in serverless.yml.

custom:
  apiGatewayServiceProxies:
    - kinesis: # partitionkey is set apigateway requestid by default
        path: /kinesis
        method: post
        streamName: { Ref: 'YourStream' }
        cors: true
    - kinesis:
        path: /kinesis
        method: post
        partitionKey: 'hardcordedkey' # use static partitionkey
        streamName: { Ref: 'YourStream' }
        cors: true
    - kinesis:
        path: /kinesis/{myKey} # use path parameter
        method: post
        partitionKey:
          pathParam: myKey
        streamName: { Ref: 'YourStream' }
        cors: true
    - kinesis:
        path: /kinesis
        method: post
        partitionKey:
          bodyParam: data.myKey # use body parameter
        streamName: { Ref: 'YourStream' }
        cors: true
    - kinesis:
        path: /kinesis
        method: post
        partitionKey:
          queryStringParam: myKey # use query string param
        streamName: { Ref: 'YourStream' }
        cors: true
    - kinesis: # PutRecords
        path: /kinesis
        method: post
        action: PutRecords
        streamName: { Ref: 'YourStream' }
        cors: true

resources:
  Resources:
    YourStream:
      Type: AWS::Kinesis::Stream
      Properties:
        ShardCount: 1

Sample request after deploying.

curl https://xxxxxxx.execute-api.us-east-1.amazonaws.com/dev/kinesis -d '{"message": "some data"}'  -H 'Content-Type:application/json'

SQS

Sample syntax for SQS proxy in serverless.yml.

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /sqs
        method: post
        queueName: { 'Fn::GetAtt': ['SQSQueue', 'QueueName'] }
        cors: true

resources:
  Resources:
    SQSQueue:
      Type: 'AWS::SQS::Queue'

Sample request after deploying.

curl https://xxxxxx.execute-api.us-east-1.amazonaws.com/dev/sqs -d '{"message": "testtest"}' -H 'Content-Type:application/json'

Customizing request parameters

If you'd like to pass additional data to the integration request, you can do so by including your custom API Gateway request parameters in serverless.yml like so:

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /queue
        method: post
        queueName: !GetAtt MyQueue.QueueName
        cors: true

        requestParameters:
          'integration.request.querystring.MessageAttribute.1.Name': "'cognitoIdentityId'"
          'integration.request.querystring.MessageAttribute.1.Value.StringValue': 'context.identity.cognitoIdentityId'
          'integration.request.querystring.MessageAttribute.1.Value.DataType': "'String'"
          'integration.request.querystring.MessageAttribute.2.Name': "'cognitoAuthenticationProvider'"
          'integration.request.querystring.MessageAttribute.2.Value.StringValue': 'context.identity.cognitoAuthenticationProvider'
          'integration.request.querystring.MessageAttribute.2.Value.DataType': "'String'"

The alternative way to pass MessageAttribute parameters is via a request body mapping template.

Customizing request body mapping templates

See the SQS section under Customizing request body mapping templates

Customizing responses

Simplified response template customization

You can get a simple customization of the responses by providing a template for the possible responses. The template is assumed to be application/json.

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /queue
        method: post
        queueName: !GetAtt MyQueue.QueueName
        cors: true
        response:
          template:
            # `success` is used when the integration response is 200
            success: |-
              { "message: "accepted" }
            # `clientError` is used when the integration response is 400
            clientError: |-
              { "message": "there is an error in your request" }
            # `serverError` is used when the integration response is 500
            serverError: |-
              { "message": "there was an error handling your request" }

Full response customization

If you want more control over the integration response, you can provide an array of objects for the response value:

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /queue
        method: post
        queueName: !GetAtt MyQueue.QueueName
        cors: true
        response:
          - statusCode: 200
            selectionPattern: '2\\d{2}'
            responseParameters: {}
            responseTemplates:
              application/json: |-
                { "message": "accepted" }

The object keys correspond to the API Gateway integration response object.

S3

Sample syntax for S3 proxy in serverless.yml.

custom:
  apiGatewayServiceProxies:
    - s3:
        path: /s3
        method: post
        action: PutObject
        bucket:
          Ref: S3Bucket
        key: static-key.json # use static key
        cors: true

    - s3:
        path: /s3/{myKey} # use path param
        method: get
        action: GetObject
        bucket:
          Ref: S3Bucket
        key:
          pathParam: myKey
        cors: true

    - s3:
        path: /s3
        method: delete
        action: DeleteObject
        bucket:
          Ref: S3Bucket
        key:
          queryStringParam: key # use query string param
        cors: true

resources:
  Resources:
    S3Bucket:
      Type: 'AWS::S3::Bucket'

Sample request after deploying.

curl https://xxxxxx.execute-api.us-east-1.amazonaws.com/dev/s3 -d '{"message": "testtest"}' -H 'Content-Type:application/json'

Customizing request parameters

Similar to the SQS support, you can customize the default request parameters serverless.yml like so:

custom:
  apiGatewayServiceProxies:
    - s3:
        path: /s3
        method: post
        action: PutObject
        bucket:
          Ref: S3Bucket
        cors: true

        requestParameters:
          # if requestParameters has a 'integration.request.path.object' property you should remove the key setting
          'integration.request.path.object': 'context.requestId'
          'integration.request.header.cache-control': "'public, max-age=31536000, immutable'"

Customizing request templates

If you'd like use custom API Gateway request templates, you can do so like so:

custom:
  apiGatewayServiceProxies:
    - s3:
        path: /s3
        method: get
        action: GetObject
        bucket:
          Ref: S3Bucket
        request:
          template:
            application/json: |
              #set ($specialStuff = $context.request.header.x-special)
              #set ($context.requestOverride.path.object = $specialStuff.replaceAll('_', '-'))
              {}

Note that if the client does not provide a Content-Type header in the request, ApiGateway defaults to application/json.

Customize the Path Override in API Gateway

Added the new customization parameter that lets the user set a custom Path Override in API Gateway other than the {bucket}/{object} This parameter is optional and if not set, will fall back to {bucket}/{object} The Path Override will add {bucket}/ automatically in front

Please keep in mind, that key or path.object still needs to be set at the moment (maybe this will be made optional later on with this)

Usage (With 2 Path Parameters (folder and file and a fixed file extension)):

custom:
  apiGatewayServiceProxies:
    - s3:
        path: /s3/{folder}/{file}
        method: get
        action: GetObject
        pathOverride: '{folder}/{file}.xml'
        bucket:
          Ref: S3Bucket
        cors: true

        requestParameters:
          # if requestParameters has a 'integration.request.path.object' property you should remove the key setting
          'integration.request.path.folder': 'method.request.path.folder'
          'integration.request.path.file': 'method.request.path.file'
          'integration.request.path.object': 'context.requestId'
          'integration.request.header.cache-control': "'public, max-age=31536000, immutable'"

This will result in API Gateway setting the Path Override attribute to {bucket}/{folder}/{file}.xml So for example if you navigate to the API Gatway endpoint /language/en it will fetch the file in S3 from {bucket}/language/en.xml

Can use greedy, for deeper Folders

The forementioned example can also be shortened by a greedy approach. Thanks to @taylorreece for mentioning this.

custom:
  apiGatewayServiceProxies:
    - s3:
        path: /s3/{myPath+}
        method: get
        action: GetObject
        pathOverride: '{myPath}.xml'
        bucket:
          Ref: S3Bucket
        cors: true

        requestParameters:
          # if requestParameters has a 'integration.request.path.object' property you should remove the key setting
          'integration.request.path.myPath': 'method.request.path.myPath'
          'integration.request.path.object': 'context.requestId'
          'integration.request.header.cache-control': "'public, max-age=31536000, immutable'"

This will translate for example /s3/a/b/c to a/b/c.xml

Customizing responses

You can get a simple customization of the responses by providing a template for the possible responses. The template is assumed to be application/json.

custom:
  apiGatewayServiceProxies:
    - s3:
        path: /s3
        method: post
        action: PutObject
        bucket:
          Ref: S3Bucket
        key: static-key.json
        response:
          template:
            # `success` is used when the integration response is 200
            success: |-
              { "message: "accepted" }
            # `clientError` is used when the integration response is 400
            clientError: |-
              { "message": "there is an error in your request" }
            # `serverError` is used when the integration response is 500
            serverError: |-
              { "message": "there was an error handling your request" }

SNS

Sample syntax for SNS proxy in serverless.yml.

custom:
  apiGatewayServiceProxies:
    - sns:
        path: /sns
        method: post
        topicName: { 'Fn::GetAtt': ['SNSTopic', 'TopicName'] }
        cors: true

resources:
  Resources:
    SNSTopic:
      Type: AWS::SNS::Topic

Sample request after deploying.

curl https://xxxxxx.execute-api.us-east-1.amazonaws.com/dev/sns -d '{"message": "testtest"}' -H 'Content-Type:application/json'

Customizing responses

Simplified response template customization

You can get a simple customization of the responses by providing a template for the possible responses. The template is assumed to be application/json.

custom:
  apiGatewayServiceProxies:
    - sns:
        path: /sns
        method: post
        topicName: { 'Fn::GetAtt': ['SNSTopic', 'TopicName'] }
        cors: true
        response:
          template:
            # `success` is used when the integration response is 200
            success: |-
              { "message: "accepted" }
            # `clientError` is used when the integration response is 400
            clientError: |-
              { "message": "there is an error in your request" }
            # `serverError` is used when the integration response is 500
            serverError: |-
              { "message": "there was an error handling your request" }

Full response customization

If you want more control over the integration response, you can provide an array of objects for the response value:

custom:
  apiGatewayServiceProxies:
    - sns:
        path: /sns
        method: post
        topicName: { 'Fn::GetAtt': ['SNSTopic', 'TopicName'] }
        cors: true
        response:
          - statusCode: 200
            selectionPattern: '2\d{2}'
            responseParameters: {}
            responseTemplates:
              application/json: |-
                { "message": "accepted" }

The object keys correspond to the API Gateway integration response object.

Content Handling and Pass Through Behaviour customization

If you want to work with binary fata, you can not specify contentHandling and PassThrough inside the request object.

custom:
  apiGatewayServiceProxies:
    - sns:
        path: /sns
        method: post
        topicName: { 'Fn::GetAtt': ['SNSTopic', 'TopicName'] }
        request:
          contentHandling: CONVERT_TO_TEXT
          passThrough: WHEN_NO_TEMPLATES

The allowed values correspond with the API Gateway Method integration for ContentHandling and PassthroughBehavior

DynamoDB

Sample syntax for DynamoDB proxy in serverless.yml. Currently, the supported DynamoDB Operations are PutItem, GetItem and DeleteItem.

custom:
  apiGatewayServiceProxies:
    - dynamodb:
        path: /dynamodb/{id}/{sort}
        method: put
        tableName: { Ref: 'YourTable' }
        hashKey: # set pathParam or queryStringParam as a partitionkey.
          pathParam: id
          attributeType: S
        rangeKey: # required if also using sort key. set pathParam or queryStringParam.
          pathParam: sort
          attributeType: S
        action: PutItem # specify action to the table what you want
        condition: attribute_not_exists(Id) # optional Condition Expressions parameter for the table
        cors: true
    - dynamodb:
        path: /dynamodb
        method: get
        tableName: { Ref: 'YourTable' }
        hashKey:
          queryStringParam: id # use query string parameter
          attributeType: S
        rangeKey:
          queryStringParam: sort
          attributeType: S
        action: GetItem
        cors: true
    - dynamodb:
        path: /dynamodb/{id}
        method: delete
        tableName: { Ref: 'YourTable' }
        hashKey:
          pathParam: id
          attributeType: S
        action: DeleteItem
        cors: true

resources:
  Resources:
    YourTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: YourTable
        AttributeDefinitions:
          - AttributeName: id
            AttributeType: S
          - AttributeName: sort
            AttributeType: S
        KeySchema:
          - AttributeName: id
            KeyType: HASH
          - AttributeName: sort
            KeyType: RANGE
        ProvisionedThroughput:
          ReadCapacityUnits: 1
          WriteCapacityUnits: 1

Sample request after deploying.

curl -XPUT https://xxxxxxx.execute-api.us-east-1.amazonaws.com/dev/dynamodb/<hashKey>/<sortkey> \
 -d '{"name":{"S":"john"},"address":{"S":"xxxxx"}}' \
 -H 'Content-Type:application/json'

EventBridge

Sample syntax for EventBridge proxy in serverless.yml.

custom:
  apiGatewayServiceProxies:
    - eventbridge:  # source and detailType are hardcoded; detail defaults to POST body
        path: /eventbridge
        method: post
        source: 'hardcoded_source'
        detailType: 'hardcoded_detailType'
        eventBusName: { Ref: 'YourBusName' }
        cors: true
    - eventbridge:  # source and detailType as path parameters
        path: /eventbridge/{detailTypeKey}/{sourceKey}
        method: post
        detailType:
          pathParam: detailTypeKey
        source:
          pathParam: sourceKey
        eventBusName: { Ref: 'YourBusName' }
        cors: true
    - eventbridge:  # source, detail, and detailType as body parameters
        path: /eventbridge/{detailTypeKey}/{sourceKey}
        method: post
        detailType:
          bodyParam: data.detailType
        source:
          bodyParam: data.source
        detail:
          bodyParam: data.detail
        eventBusName: { Ref: 'YourBusName' }
        cors: true

resources:
  Resources:
    YourBus:
      Type: AWS::Events::EventBus
      Properties:
        Name: YourEventBus

Sample request after deploying.

curl https://xxxxxxx.execute-api.us-east-1.amazonaws.com/dev/eventbridge -d '{"message": "some data"}'  -H 'Content-Type:application/json'

Common API Gateway features

Enabling CORS

To set CORS configurations for your HTTP endpoints, simply modify your event configurations as follows:

custom:
  apiGatewayServiceProxies:
    - kinesis:
        path: /kinesis
        method: post
        streamName: { Ref: 'YourStream' }
        cors: true

Setting cors to true assumes a default configuration which is equivalent to:

custom:
  apiGatewayServiceProxies:
    - kinesis:
        path: /kinesis
        method: post
        streamName: { Ref: 'YourStream' }
        cors:
          origin: '*'
          headers:
            - Content-Type
            - X-Amz-Date
            - Authorization
            - X-Api-Key
            - X-Amz-Security-Token
            - X-Amz-User-Agent
          allowCredentials: false

Configuring the cors property sets Access-Control-Allow-Origin, Access-Control-Allow-Headers, Access-Control-Allow-Methods,Access-Control-Allow-Credentials headers in the CORS preflight response. To enable the Access-Control-Max-Age preflight response header, set the maxAge property in the cors object:

custom:
  apiGatewayServiceProxies:
    - kinesis:
        path: /kinesis
        method: post
        streamName: { Ref: 'YourStream' }
        cors:
          origin: '*'
          maxAge: 86400

If you are using CloudFront or another CDN for your API Gateway, you may want to setup a Cache-Control header to allow for OPTIONS request to be cached to avoid the additional hop.

To enable the Cache-Control header on preflight response, set the cacheControl property in the cors object:

custom:
  apiGatewayServiceProxies:
    - kinesis:
        path: /kinesis
        method: post
        streamName: { Ref: 'YourStream' }
        cors:
          origin: '*'
          headers:
            - Content-Type
            - X-Amz-Date
            - Authorization
            - X-Api-Key
            - X-Amz-Security-Token
            - X-Amz-User-Agent
          allowCredentials: false
          cacheControl: 'max-age=600, s-maxage=600, proxy-revalidate' # Caches on browser and proxy for 10 minutes and doesnt allow proxy to serve out of date content

Adding Authorization

You can pass in any supported authorization type:

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /sqs
        method: post
        queueName: { 'Fn::GetAtt': ['SQSQueue', 'QueueName'] }
        cors: true

        # optional - defaults to 'NONE'
        authorizationType: 'AWS_IAM' # can be one of ['NONE', 'AWS_IAM', 'CUSTOM', 'COGNITO_USER_POOLS']

        # when using 'CUSTOM' authorization type, one should specify authorizerId
        # authorizerId: { Ref: 'AuthorizerLogicalId' }
        # when using 'COGNITO_USER_POOLS' authorization type, one can specify a list of authorization scopes
        # authorizationScopes: ['scope1','scope2']

resources:
  Resources:
    SQSQueue:
      Type: 'AWS::SQS::Queue'

Source: AWS::ApiGateway::Method docs

Enabling API Token Authentication

You can indicate whether the method requires clients to submit a valid API key using private flag:

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /sqs
        method: post
        queueName: { 'Fn::GetAtt': ['SQSQueue', 'QueueName'] }
        cors: true
        private: true

resources:
  Resources:
    SQSQueue:
      Type: 'AWS::SQS::Queue'

which is the same syntax used in Serverless framework.

Source: Serverless: Setting API keys for your Rest API

Source: AWS::ApiGateway::Method docs

Using a Custom IAM Role

By default, the plugin will generate a role with the required permissions for each service type that is configured.

You can configure your own role by setting the roleArn attribute:

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /sqs
        method: post
        queueName: { 'Fn::GetAtt': ['SQSQueue', 'QueueName'] }
        cors: true
        roleArn: # Optional. A default role is created when not configured
          Fn::GetAtt: [CustomS3Role, Arn]

resources:
  Resources:
    SQSQueue:
      Type: 'AWS::SQS::Queue'
    CustomS3Role:
      # Custom Role definition
      Type: 'AWS::IAM::Role'

Customizing API Gateway parameters

The plugin allows one to specify which parameters the API Gateway method accepts.

A common use case is to pass custom data to the integration request:

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /sqs
        method: post
        queueName: { 'Fn::GetAtt': ['SqsQueue', 'QueueName'] }
        cors: true
        acceptParameters:
          'method.request.header.Custom-Header': true
        requestParameters:
          'integration.request.querystring.MessageAttribute.1.Name': "'custom-Header'"
          'integration.request.querystring.MessageAttribute.1.Value.StringValue': 'method.request.header.Custom-Header'
          'integration.request.querystring.MessageAttribute.1.Value.DataType': "'String'"
resources:
  Resources:
    SqsQueue:
      Type: 'AWS::SQS::Queue'

Any published SQS message will have the Custom-Header value added as a message attribute.

Customizing request body mapping templates

Kinesis

If you'd like to add content types or customize the default templates, you can do so by including your custom API Gateway request mapping template in serverless.yml like so:

# Required for using Fn::Sub
plugins:
  - serverless-cloudformation-sub-variables

custom:
  apiGatewayServiceProxies:
    - kinesis:
        path: /kinesis
        method: post
        streamName: { Ref: 'MyStream' }
        request:
          template:
            text/plain:
              Fn::Sub:
                - |
                  #set($msgBody = $util.parseJson($input.body))
                  #set($msgId = $msgBody.MessageId)
                  {
                      "Data": "$util.base64Encode($input.body)",
                      "PartitionKey": "$msgId",
                      "StreamName": "#{MyStreamArn}"
                  }
                - MyStreamArn:
                    Fn::GetAtt: [MyStream, Arn]

It is important that the mapping template will return a valid application/json string

Source: How to connect SNS to Kinesis for cross-account delivery via API Gateway

SQS

Customizing SQS request templates requires us to force all requests to use an application/x-www-form-urlencoded style body. The plugin sets the Content-Type header to application/x-www-form-urlencoded for you, but API Gateway will still look for the template under the application/json request template type, so that is where you need to configure you request body in serverless.yml:

custom:
  apiGatewayServiceProxies:
    - sqs:
        path: /{version}/event/receiver
        method: post
        queueName: { 'Fn::GetAtt': ['SqsQueue', 'QueueName'] }
        request:
          template:
            application/json: |-
              #set ($body = $util.parseJson($input.body))
              Action=SendMessage##
              &MessageGroupId=$util.urlEncode($body.event_type)##
              &MessageDeduplicationId=$util.urlEncode($body.event_id)##
              &MessageAttribute.1.Name=$util.urlEncode("X-Custom-Signature")##
              &MessageAttribute.1.Value.DataType=String##
              &MessageAttribute.1.Value.StringValue=$util.urlEncode($input.params("X-Custom-Signature"))##
              &MessageBody=$util.urlEncode($input.body)

Note that the ## at the end of each line is an empty comment. In VTL this has the effect of stripping the newline from the end of the line (as it is commented out), which makes API Gateway read all the lines in the template as one line.

Be careful when mixing additional requestParameters into your SQS endpoint as you may overwrite the integration.request.header.Content-Type and stop the request template from being parsed correctly. You may also unintentionally create conflicts between parameters passed using requestParameters and those in your request template. Typically you should only use the request template if you need to manipulate the incoming request body in some way.

Your custom template must also set the Action and MessageBody parameters, as these will not be added for you by the plugin.

When using a custom request body, headers sent by a client will no longer be passed through to the SQS queue (PassthroughBehavior is automatically set to NEVER). You will need to pass through headers sent by the client explicitly in the request body. Also, any custom querystring parameters in the requestParameters array will be ignored. These also need to be added via the custom request body.

SNS

Similar to the Kinesis support, you can customize the default request mapping templates in serverless.yml like so:

# Required for using Fn::Sub
plugins:
  - serverless-cloudformation-sub-variables

custom:
  apiGatewayServiceProxies:
    - kinesis:
        path: /sns
        method: post
        topicName: { 'Fn::GetAtt': ['SNSTopic', 'TopicName'] }
        request:
          template:
            application/json:
              Fn::Sub:
                - "Action=Publish&Message=$util.urlEncode('This is a fixed message')&TopicArn=$util.urlEncode('#{MyTopicArn}')"
                - MyTopicArn: { Ref: MyTopic }

It is important that the mapping template will return a valid application/x-www-form-urlencoded string

Source: Connect AWS API Gateway directly to SNS using a service integration

Custom response body mapping templates

You can customize the response body by providing mapping templates for success, server errors (5xx) and client errors (4xx).

Templates must be in JSON format. If a template isn't provided, the integration response will be returned as-is to the client.

Kinesis Example

custom:
  apiGatewayServiceProxies:
    - kinesis:
        path: /kinesis
        method: post
        streamName: { Ref: 'MyStream' }
        response:
          template:
            success: |
              {
                "success": true
              }
            serverError: |
              {
                "success": false,
                "errorMessage": "Server Error"
              }
            clientError: |
              {
                "success": false,
                "errorMessage": "Client Error"
              }

Author: Serverless-operations
Source Code: https://github.com/serverless-operations/serverless-apigateway-service-proxy 
License: 

#serverless #api #aws 

Anil  Sakhiya

Anil Sakhiya

1595141479

Apache Spark For Beginners In 3 Hours | Apache Spark Training

In this Apache Spark For Beginners, we will have an overview of Spark in Big Data. We will start with an introduction to Apache Spark Programming. Then we will move to know the Spark History. Moreover, we will learn why Spark is needed and covers everything that an individual needed to master its skill in this field. In this Apache Spark tutorial, you will not only learn Spark from the basics but also through this Apache Spark tutorial, you will get to know the Spark architecture and its components such as Spark Core, Spark Programming, Spark SQL, Spark Streaming, and much more.

This “Spark Tutorial” will help you to comprehensively learn all the concepts of Apache Spark. Apache Spark has a bright future. Many companies have recognized the power of Spark and quickly started worked on it. The primary importance of Apache Spark in the Big data industry is because of its in-memory data processing. Spark can also handle many analytics challenges because of its low-latency in-memory data processing capability.

Spark’s shell provides you a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in either Scala (which runs on the Java VM and is thus a good way to use existing Java libraries) or Python

This Spark tutorial will comprise of the following topics:

  • 00:00:00 - Introduction
  • 00:00:52 - Spark Fundamentals
  • 00:23:11 - Spark Architecture
  • 01:01:08 - Spark Demo

#apache-spark #apache #spark #big-data #developer

Gunjan  Khaitan

Gunjan Khaitan

1582649280

Apache Spark Tutorial For Beginners - Apache Spark Full Course

This full course video on Apache Spark will help you learn the basics of Big Data, what Apache Spark is, and the architecture of Apache Spark. Then, you will understand how to install Apache Spark on Windows and Ubuntu. You will look at the important components of Spark, such as Spark Streaming, Spark MLlib, and Spark SQL. Finally, you will get an idea about implement Spark with Python in PySpark tutorial and look at some of the important Apache Spark interview questions. Now, let’s get started and learn Apache Spark in detail.

Below topics are explained in this Apache Spark Full Course:

  1. Animated Video 01:15
  2. History of Spark 06:48
  3. What is Spark 07:28
  4. Hadoop vs spark 08:32
  5. Components of Apache Spark 14:14
  6. Spark Architecture 33:26
  7. Applications of Spark 40:05
  8. Spark Use Case 42:08
  9. Running a Spark Application 44:08
  10. Apache Spark insallation on Windows 01:01:03
  11. Apache Spark insallation on Ubuntu 01:31:54
  12. What is Spark Streaming 01:49:31
  13. Spark Streaming data sources 01:50:39
  14. Features of Spark Streaming 01:52:19
  15. Working of Spark Streaming 01:52:53
  16. Discretized Streams 01:54:03
  17. caching/persistence 02:02:17
  18. checkpointing in spark streaming 02:04:34
  19. Demo on Spark Streaming 02:18:27
  20. What is Spark MLlib 02:47:29
  21. What is Machine Learning 02:49:14
  22. Machine Learning Algorithms 02:51:38
  23. Spark MLlib Tools 02:53:01
  24. Spark MLlib Data Types 02:56:42
  25. Machine Learning Pipelines 03:09:05
  26. Spark MLlib Demo 03:18:38
  27. What is Spark SQL 04:01:40
  28. Spark SQL Features 04:03:52
  29. Spark SQL Architecture 04:07:43
  30. Spark SQL Data Frame 04:09:59
  31. Spark SQL Data Source 04:11:55
  32. Spark SQL Demo 04:23:00
  33. What is PySpark 04:52:03
  34. PySpark Features 04:58:02
  35. PySpark with Python and Scala 04:58:54
  36. PySpark Contents 05:00:35
  37. PySpark Subpackages 05:40:10
  38. Companies using PySpark 05:41:16
  39. PySpark Demo 05:41:49
  40. Spark Interview Questions 05:50:43

#bigdata #apache #spark #apache-spark