Unleash serverless speed! Compare Rust with Go, Java, and Python in AWS Lambda functions. Dive into performance analysis for optimal serverless efficiency.
At Scanner, we use serverless Lambda functions to perform fast full-text search over large volumes of logs in data lakes, and our queries need to be lightning fast. We use Rust for this use case, but we wanted to know how Rust compared with Go, Java, and Python in terms of performance. We pitted the four languages against one another to see which was the fastest, and here is what we found.
The experiment
We compared the languages by giving them a classic, “bursty” data lake task:
Given 1 GB of JSON log events in S3, run a Lambda function from a cold start to stream, decompress, and parse the objects on the fly.
There were 4 parameters for each test run:
We ran 10 test runs from a cold start for each choice of language, CPU architecture, JSON parser, and memory allocation.
For each Lambda function invocation, we recorded the cold start time and the total time to process 1GB of JSON logs in S3.
The high-level takeaways
The chart above shows the performance results for the fastest performing settings for each language (Python, Java, Go, and Rust) at varying memory allocation levels. The lower the task duration, the faster the performance.
Here are our high-level takeaways:
encoding/json
was 10x slower than the parser in valyala/fastjson
. In fact, our Go-based function using the standard library parser was slower than our Python-based function, but our Go function using the fast parser was insanely fast.In the rest of the blog post, we will cover what the code looked like for each language, how performance varied as we tried different CPU architectures and JSON parsing libraries for each language, how cold start times differed between languages, and how memory allocation affected S3 download speed.
Rust code and deployment
We used the excellent cargo-lambda
tool to generate a Rust project for our Lambda function and compile the release binary.
Here is the basic structure of the Lambda function:
bucket
and key
zstd
We tested out two different JSON parsing libraries:
The code below is the version of our Rust Lambda function that used the simdjson
parser.
// main.rs
use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use rusoto_core::Region;
use rusoto_s3::{S3Client, S3};
use serde::{Deserialize, Serialize};
use tokio::io::{AsyncReadExt, AsyncBufReadExt};
#[derive(Deserialize)]
struct Request {
bucket: String,
key: String,
}
#[derive(Serialize)]
struct Response {
req_id: String,
msg: String,
}
async fn handle_request(event: LambdaEvent<Request>) -> Result<Response, Error> {
let started_at = std::time::Instant::now();
let client = S3Client::new(Region::UsWest2);
let output = client
.get_object(rusoto_s3::GetObjectRequest {
bucket: bucket.to_string(),
key: key.to_string(),
..Default::default()
})
.await?;
let Some(body) = output.body else {
return Err(anyhow::anyhow!("No body found in S3 response").into());
};
let body = body.into_async_read();
let body = tokio::io::BufReader::new(body);
let decoder = async_compression::tokio::bufread::ZstdDecoder::new(body);
let reader = tokio::io::BufReader::new(decoder);
let mut lines = reader.lines();
let mut num_log_events = 0;
while let Ok(Some(mut line)) = lines.next_line().await {
let _value = unsafe {
simd_json::to_borrowed_value(line.as_mut_str().as_bytes_mut())?
};
num_log_events += 1;
if num_log_events % 1000 == 0 {
println!("num_log_events={}", num_log_events);
}
}
let msg = format!(
"elapsed={:?} num_log_events={}",
started_at.elapsed(),
num_log_events
);
Ok(Response {
req_id: event.context.request_id,
msg,
})
}
#[tokio::main]
async fn main() -> Result<(), Error> {
tracing_subscriber::fmt()
.with_max_level(tracing::Level::INFO)
.init();
run(service_fn(handle_request)).await
}
Our build script below used cargo-lambda
to create a zip file containing the Rust binary. cargo-lambda
can create builds for either of the CPU architectures supported by Lambda: arm64
or x86-64
.
# build.sh
cargo lambda build --release --arm64
(cd ./target/lambda/lambda_langs_test_rust/ && zip ./bootstrap.zip ./bootstrap)
We used the AWS console to create our Lambda function, but you can also use the AWS CLI from your terminal to create your function. Make sure to use an IAM role that has read access to your S3 bucket and key. Here is an example:
aws lambda create-function \
--function-name lambda_langs_test_rust \
--runtime provided.al2 \
--memory-size 640 \
--architectures arm64 \
--zip-file ./bootstrap.zip \
--handler unused \
--timeout 900 \
--role ${LAMBDA_IAM_ROLE}
We used the AWS CLI to invoke the Lambda function. To get information about total invocation durations and cold start times, we retrieved the REPORT
log entry that appears at the end of Lambda invocation logs. You can see these logs if you use --log-type Tail
in your AWS CLI invocation, and then use jq
and base64
to decode the logs.
aws lambda invoke \
--function-name lambda_langs_test_python \
--log-type Tail \
--cli-binary-format raw-in-base64-out \
--payload '{"bucket": "<s3_bucket>", "key": "<s3_key>"}' \
./response.json \
| jq -r .LogResult | base64 --decode
Here is an example of the log output. The total invocation duration is given by the Duration
field, and the cold start time is given by Init Duration
.
...
num_log_events=364000
num_log_events=365000
elapsed=2.050280398s num_log_events=365057
END RequestId: 07f141a7-d7d1-44cc-ba7d-8f7e7757c780
REPORT RequestId: 07f141a7-d7d1-44cc-ba7d-8f7e7757c780 Duration: 2051.89 ms
Billed Duration: 2092 ms Memory Size: 1024 MB Max Memory Used: 27 MB
Init Duration: 39.47 ms
Rust performance results
This is not surprising, but Rust is fast. With optimal settings, our Rust lambda function took only 2 seconds to process 1GB of JSON logs in S3.
The chart above shows the average task duration across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB. The lower the task duration, the faster the performance.
Each color shows performance for a specific choice of CPU architecture (x86_64
or arm64
) and JSON parsing library (serde_json
or simdjson
).
Here are some of the interesting takeaways:
simdjson
. In general, we have found that SIMD accelerated libraries tend to have better support for x86_64 than for arm64.simdjson
parsing gave a 2-3x performance improvement over serde_json
. If you need to optimize performance, consider trying SIMD accelerated libraries or other tools that use special highly optimized CPU instructions.Go code and deployment
The code for the Go version of our Lambda function looks fairly similar to the Rust version. We read, decompress, and parse a stream of JSON objects from S3.
We tested out two different JSON parsing libraries:
The code below shows our version that used fastjson
.
package main
import (
"bufio"
"context"
"fmt"
"time"
"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/klauspost/compress/zstd"
"github.com/valyala/fastjson"
)
type Request struct {
Bucket string `json:"bucket"`
Key string `json:"key"`
}
func HandleRequest(ctx context.Context, request Request) (string, error) {
start := time.Now()
cfg, err := config.LoadDefaultConfig(context.TODO())
if err != nil {
return "", err
}
client := s3.NewFromConfig(cfg)
output, err := client.GetObject(context.TODO(), &s3.GetObjectInput{
Bucket: &request.Bucket,
Key: &request.Key,
})
if err != nil {
return "", err
}
body := bufio.NewReader(output.Body)
decoder, err := zstd.NewReader(body)
defer decoder.Close()
if err != nil {
return "", err
}
num_log_events := 0
scanner := bufio.NewScanner(decoder)
var parser fastjson.Parser
for scanner.Scan() {
bytes := scanner.Bytes()
_, err := parser.ParseBytes(bytes)
if err != nil {
return "", err
}
num_log_events += 1
if num_log_events%1000 == 0 {
fmt.Printf("num_log_events=%d\n", num_log_events)
}
}
err = scanner.Err()
if err != nil {
return "", err
}
elapsed := time.Since(start)
outputMsg := fmt.Sprintf("num_log_events=%d elapsed=%v", num_log_events, elapsed)
return outputMsg, nil
}
func main() {
lambda.Start(HandleRequest)
}
We deployed the Go version of our Lambda function as a Docker container. Here is the Dockerfile
and a push_container.sh
script to build the container and push it to the AWS Elastic Container Registry. We are using docker buildx
, the extended BuildKit tool set, to build for a specific CPU architecture, namely arm64
.
# Dockerfile
FROM public.ecr.aws/lambda/provided:al2 as build
# install compiler
RUN yum install -y golang
RUN go env -w GOPROXY=direct
# cache dependencies
ADD go.mod go.sum ./
RUN go mod download
# build
ADD . .
RUN GOARCH=arm64 go build -o /main
# copy artifacts to a clean image
FROM public.ecr.aws/lambda/provided:al2
COPY --from=build /main /main
ENTRYPOINT [ "/main" ]
# push_container.sh
docker buildx build --platform linux/arm64/v8 . -t lambda_langs_test_go
docker tag lambda_langs_test_go:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_go:latest
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_go:latest
To create a Lambda function using a Docker container image instead of a zip file, you can run a command like this with the AWS CLI. Note that we use the --code
flag with the container’s ECR URI instead of the --zip-file
flag that we used with Rust.
aws lambda create-function \
--function-name lambda_langs_test_go \
--memory-size 640 \
--architectures arm64 \
--code ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_go:latest
--timeout 900 \
--role ${LAMBDA_IAM_ROLE}
We invoked the function with the AWS CLI the same way we showed earlier in the Rust section.
Go performance results
Surprisingly, Go was very slow when we used the standard library JSON parser. Thankfully, it was very fast when we used fastjson
.
The chart above shows the average task duration across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB. The lower the task duration, the faster the performance.
Each color shows performance for a specific choice of CPU architecture (x86_64
or arm64
) and JSON parsing library (encoding/json
or fastjson
).
Here are some of the interesting takeaways:
encoding/json
was 10x slower than the parser in valyala/fastjson
. In fact, our Go-based function using the standard library parser was slower than our Python-based function, but our Go function using the fast parser was insanely fast.fastjson
library, Go’s performance matched Rust’s. In fact, Go actually beat Rust slightly at very high memory allocation levels – at 3GB of memory allocation and above.fastjson
parser.Java code and deployment
The Java version of our Lambda function is more verbose than Python version, but it has the same simple structure:
We tried out two different JSON parsing libraries to see if there were performance differences:
The code below shows the version where we used jsoniter
.
// Handler.java
package example;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.LambdaLogger;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.S3Client;
import com.github.luben.zstd.ZstdInputStream;
import com.jsoniter.any.Any;
import com.jsoniter.JsonIterator;
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.util.Map;
// Handler value: example.Handler
public class Handler implements RequestHandler<Map<String,String>, String> {
Gson gson = new GsonBuilder().setPrettyPrinting().create();
@Override
public String handleRequest(Map<String,String> event, Context context) {
LambdaLogger logger = context.getLogger();
String bucket = event.get("bucket");
String key = event.get("key");
System.out.println("Event: " + gson.toJson(event));
S3Client s3Client = S3Client.builder().build();
GetObjectRequest getObjectRequest = GetObjectRequest.builder()
.bucket(bucket)
.key(key)
.build();
InputStream responseBody = s3Client.getObject(getObjectRequest);
try {
ZstdInputStream decompressStream = new ZstdInputStream(
new BufferedInputStream(responseBody)
);
BufferedReader reader = new BufferedReader(
new InputStreamReader(new BufferedInputStream(decompressStream))
);
int count = 0;
JsonIterator jsonIterator = new JsonIterator();
String line;
while ((line = reader.readLine()) != null) {
++count;
Any jsonObject = jsonIterator.deserialize(line);
}
System.out.println("num_log_events=" + count);
} catch (IOException ex) {
System.err.println("ERROR: " + ex.toString());
return "500 Internal Server Error";
}
return "200 OK";
}
}
To deploy our Java Lambda function, we used Gradle to build a zip file containing our code. Here is what our build.gradle looked like:
# build.gradle
plugins {
id 'java'
}
repositories {
mavenCentral()
}
dependencies {
implementation 'com.amazonaws:aws-lambda-java-core:1.2.1'
implementation 'com.google.code.gson:gson:2.8.9'
implementation platform('software.amazon.awssdk:bom:2.19.33')
implementation 'software.amazon.awssdk:s3'
implementation 'com.github.luben:zstd-jni:1.5.2-5'
implementation 'org.json:json:20220924'
implementation 'com.jsoniter:jsoniter:0.9.9'
testImplementation 'org.apache.logging.log4j:log4j-api:[2.17.1,)'
testImplementation 'org.apache.logging.log4j:log4j-core:[2.17.1,)'
testImplementation 'org.apache.logging.log4j:log4j-slf4j18-impl:[2.17.1,)'
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.6.0'
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.6.0'
}
test {
useJUnitPlatform()
}
task buildZip(type: Zip) {
from compileJava
from processResources
into('lib') {
from configurations.runtimeClasspath
}
}
java {
sourceCompatibility = JavaVersion.VERSION_1_8
targetCompatibility = JavaVersion.VERSION_1_8
}
build.dependsOn buildZip
We built the zip file with gradle build -i
.
We used the AWS console to create our Lambda function and upload our zip code files, but you can use the AWS CLI in the same way described in the “Rust” section of this post above. You may need to change a few flags when you use aws lambda create-function
, like these:
--runtime java11
--handler example.Handler::handleRequest
Java performance results
Given that Java probably has the strongest AWS SDK of all languages, it is tempting to use Java in this Lambda function to process JSON logs. However, we were disappointed with the slow performance we saw, so we would not recommend using Java for this use case.
We tried running both JSON parsers we found, org.json
and jsoniter
, under each of the two CPU architectures.
Here are our takeaways:
org.json
parser was ~2x faster in arm64 than in x86_64.jsoniter
parser was basically equally fast in arm64 and x86_64.Java cold start improvement with SnapStart
In the world of AWS Lambda functions, Java’s cold start times are notoriously slow. To help address this, AWS recently released a new feature called SnapStart for Java Lambda functions. SnapStart takes a snapshot of a “warmed-up” version of your Java program and restores that snapshot during cold starts.
We saw that cold start times improved dramatically in Java when we used SnapStart. The chart above shows the average cold start times across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB.
With SnapStart enabled, Java’s cold start times were better than Python’s but still slower than Go’s and Rust’s. The chart above compares cold start times across all four languages: Python, Java with SnapStart, Go, and Rust. Again, we are showing average cold start times across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB. Here is roughly what the cold start times looked like for each language:
SnapStart is a good option if you need to use Java and want to minimize cold start times. However, the cold start times for Go and Rust are still 2-3x faster, and there are a few important SnapStart limitations you should consider. In particular, SnapStart does not support:
In general, we were disappointed with Java’s performance in Lambda functions for this bursty data-processing use case, so we recommend trying Rust or Go instead if you can.
Python code and deployment
Of the four languages, Python’s Lambda function code is the simplest. It executes an S3 GET request for the given bucket
and key
fields from the input event, downloads the response body as a stream, and decompresses the data on the fly. We read the data in 10KB chunks, split on newlines, and parse each line as a JSON object. The newline chunk splitting approach is not optimally efficient, but we intend to use the Python code here to get a performance baseline against which we can compare the performance of the other languages.
# app.py
import boto3
import json
import os
import zstandard
READ_SIZE = 10240
def handle_request(event, context):
client = boto3.client('s3')
response = client.get_object(
Bucket=event["bucket"],
Key=event["key"],
)
streaming_body = response['Body']
decompressor = zstandard.ZstdDecompressor()
buf = bytearray()
count = 0
chunk_iter = decompressor.read_to_iter(
streaming_body._raw_stream,
read_size=READ_SIZE,
)
for chunk in chunk_iter:
buf += chunk
lines = buf.split(b'\n')
for line in lines[:-1]:
log_event = json.loads(line)
count += 1
if count % 1000 == 0:
print(f"count={count}")
last_line = lines[-1]
buf.clear()
buf += last_line
print(f"DONE. count={count}")
return {
'statusCode': 200,
'body': f"Num log events scanned: {count}"
}
We chose to deploy the Python code to our Lambda function as a Docker container. Here is the Dockerfile
and a push_container.sh
script to build the container and push it to the AWS Elastic Container Registry. We are using docker buildx
, the extended BuildKit tool set, to build for a specific CPU architecture, namely arm64
.
# Dockerfile
FROM public.ecr.aws/lambda/python:3.8
# Install the function's dependencies using file requirements.txt
# from your project folder.
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD ["app.handle_request"]
# push_container.sh
docker buildx build --platform linux/arm64/v8 . -t lambda_langs_test_python
docker tag lambda_langs_test_python:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest
We created the lambda function manually in the AWS Console, but you can also do it using the AWS CLI. Make sure to use an IAM role that has permission to read from your S3 bucket.
aws lambda create-function \
--function-name lambda_langs_test_python \
--memory-size 640 \
--architectures arm64 \
--code ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/lambda_langs_test_python:latest
--timeout 900 \
--role ${LAMBDA_IAM_ROLE}
Python performance results
Unsurprisingly, Python was the slowest of the four languages we tried at processing 1 GB of logs.
The chart above shows the average task duration across 10 cold starts for each of 14 memory allocation settings, ranging from 128MB to 10GB.
Here are our takeaways:
Maxing out S3 read throughput in a Lambda function
At Scanner, we use Lambda functions to scan through S3 data at scale, so we are very interested in the maximum S3 performance we can expect from an individual function invocation.
To test this, we wrote a Rust program to read 1GB of raw data from S3 (no decompression, no parsing), and we measured the S3 throughput.
In the graph above, we show S3 read throughput averaged over 10 runs at various memory allocation levels, 128MB to 10GB with discontinuous jumps.
Here are the interesting takeaways:
It seemed that, as long as our Lambda function used 640MB of memory or more, we got optimal S3 read throughput.
There are data formats that are much faster than JSON
Although using Rust with a SIMD accelerated JSON parser can process a lot of data quickly, we can do even better.
Mozilla maintains a library called bincode, which is used for inter-process communication in Firefox. It is specifically very good at parsing binary data into Rust data structures.
We leverage bincode
in Scanner’s index file data format, which gives us 4x performance improvement over SIMD-accelerated JSON parsing.
The chart above shows Lambda function task duration (scanning 2GB of Scanner index file records) using various memory allocation levels.
Scanner index file records are quite a bit more complex than typical JSON log events, so even SIMD-accelerated JSON parsing struggles to be fast. By using bincode
in our index file format, we get extremely fast performance. This comes with an important trade-off: the bincode
format is language-specific to Rust, which means reading it from other languages is difficult.
If you are interested in learning more about the tradeoffs between the most popular data serialization formats available in the Rust ecosystem, check out this excellent blog post from LogRocket. They cover JSON, bincode
, MessagePack, and more, with plenty of data about how performance and usability differences.
Conclusion
If you want to process S3 data at scale using Lambda functions, here are our recommendations:
x86_64
than with arm64
, especially if your parsing library leverages specialized SIMD instructions.fastjson
instead of the standard library’s encoding/json
, to get 10x better performance.simdjson
instead of serde_json
to get 3x better performance.bincode
.bincode
is very fast, but it is Rust-specific and not portable between languages.If you process massive amounts of data using Lambda functions, and you feel like there are important things we’ve missed, we would love to hear from you.
Source: https://blog.scanner.dev
#rust #java #python #serverless #go #aws