Processing Large S3 Files With AWS Lambda

Despite having a runtime limit of 15 minutes, AWS Lambda can still be used to process large files. Files formats such as CSV or newline delimited JSON which can be read iteratively or line by line can be processed with this method.

Lambda is a good option if you want a serverless architecture and have files that are large but still within reasonable limits. We’ll show how we can write a lambda function that can process a large csv file in the following manner. The function will be capable of handling data sizes exceeding both its memory and runtime limits.

The main approach is as follows:

Read and process the csv file row by row until nearing timeout.
Trigger a new lambda asynchronously that will pick up from where the previous lambda stopped processing.

We will define the following event which will be used to trigger the lambda function. The use of bucket_name and object_key is necessary to identify the S3 object that will be processed, the use of the offset and fieldnames will be covered shortly.

{    
    "bucket_name": "YOUR_BUCKET_NAME",
    "object_key": "YOUR_OBJECT_KEY",
    "offset": 0,
    "fieldnames": None
}

#aws #serverless #developer #cloud

medium.com

Processing Large S3 Files With AWS Lambda