Avoiding cold starts on AWS Lambda for a long-running API request

If you’re anything like me, you think that serverless is great. Negligible running costs, no server configuration to take up your time, auto-scaling by default etc. etc. Of course, by virtue of the ‘no free lunch’ mantra this convenience and cost savings has to come at a price, and whilst there are numerous real costs to serverless (memory constraints, package size, run time restrictions, developer learning curve) this article is going to assume that you’ve already solved those (or don’t care about them) and instead focus on a specific performance cost — cold starts.

Why do cold starts hurt data science applications?

Cold starts are slow. That’s the problem, how slow exactly depends on various factors such as your runtime (python is actually one of the quicker lambda containers to start), and what you’re doing in the setup phase of the Lambda (source).

As data scientists and developers, we’re used to a gentle pace. Start a download of 200 GBs of text data (put the kettle on), load a model (poor the tea), run a clustering algorithm (go for a walk)… sound familiar? But whilst this slow pace is fine for experimentation (and productionised systems running on persistent servers), it is potentially fatal for the serverless pattern. AWS Lambda only allows a maximum of 15 mins runtime before the function times out and the container dies, confining all un-persisted work (and running computations) to the graveyard of dead containers never to be restarted.

This issue is compounded when your lambda function is in fact an API endpoint which clients can call on an ad-hoc basis to trigger some real-time data sciencey/analytics process (in our case NLP on a single phrase of arbitrary length), as that window of 15 mins Lambda run time suddenly gets slashed to a 30-second window to return an HTTP response to the client.

This article isn’t to try and put you off running ad hoc NLP pipelines in lambda, on the contrary, it hopes to help make you aware of these constraints and some ways to overcome them (or alternatively, help you to live on the edge by choosing to ignore from a place of understanding).

What is a cold start?

A cold start occurs when the container is shut down and then has to be restarted when the lambda function is invoked, typically this happens after ~5 mins of inactivity.

A lot has already been written about cold starts, so this article won’t provide a detailed guide (I recommend you check out this article for that). But in a nutshell…

When a container starts from a cold state, the function needs to:

Get and load the package containing the lambda code from external persistent storage (e.g. S3);
Spin up the container;
Load the package code in memory;
Run the function’s handler method/function.

(https://dashbird.io/blog/can-we-solve-serverless-cold-starts/)

N.b. step 4 always happen whenever you invoke a lambda function (it’s your code!), but steps 1–3 only occur for cold starts. As the setup stages take place entirely in AWS, they are outside of our control. We can optimise to our heart’s content in step 4, but steps 1–3 can still arbitrarily hit us with ~10 second+ latencies. This is obviously a problem for synchronous APIs.

Our specific problem

Getting onto our specific problem now. We had a synchronous API, which:

Took an arbitrary text input from an HTTP request
Downloaded an NLP model from S3 (~220mb)
Performed NLP on the input using the model
Returned the serialised result to the caller.

The issue here was step 2. Downloading the model from s3 on each invocation could take between 15–20 seconds. This was fine for our use case the majority of the time, as although we were providing a long-running synchronous endpoint we didn’t expect it to be quick (we’re talking about NLP on the fly, not a simple GET request).

However, during cold starts we were seeing requests frequently time out. This makes total sense as if the lambda takes 10ish seconds to start up on top of the 20ish seconds to download the model, we’re not left much time to run the NLP and return the results in a 30 second HTTP window!

#nlp #data-science #aws-lambda #cold-start #serverless-architecture

Why do cold starts hurt data science applications?

What is a cold start?

Our specific problem

towardsdatascience.com

Avoiding cold starts on AWS Lambda for a long-running API request