AWS Lambda instances have a local file system you can write to, connected to the system’s temporary path. Anything stored there is only accessible to that particular container, and it will be lost once the instance is stopped. This might be useful for temporarily caching results, but not for persistent storage. For long-term persistence, you’ll need to move the data outside the Lambda container.

Cloud storage options

There are three main choices for persistent storage in the cloud:

  • Network file systems
  • Relational databases
  • Key-value stores

Network file systems are generally not a good choice for Lambda functions, for three reasons. The first is that there is no easy way to attach one to a Lambda function out of the box now. The second is, that even if there was a easy way to attach network volumes, mounting an external file system volume takes a significant amount of time. Anything that slows down initialisation is a big issue with automatic scaling, because it can amplify problems with cold starts and request latency. The third issue is that very few network storage systems can cope with potentially thousands of concurrent users, so we’d have to severely limit concurrency for Lambda functions to use network file systems without overloading them. The most popular external file storage on AWS is the Elastic Block Store (EBS), which can’t even be attached to two containers at once.

Relational databases are good when you need to store data for flexible queries, but you pay for that flexibility with higher operational costs. Most relational database types are designed for persistent connections and introduce an initial handshake between the database service and user code to establish a connection. This initialisation can create problems with latency and cold starts, similar to what happens with network file systems. AWS now offers some relational databases on a pay-per-connection basis (for example  AWS Aurora Serverless), but in general with relational databases you have to plan for capacity and reserve it up front, which is completely opposite to what we’re trying to do with Lambda. Supporting a very high number of concurrent requests usually requires a lot of processing power, which gets quite expensive. Running relational databases on AWS often means setting up a virtual private cloud (VPC); attaching a VPC to Lambda still takes a few seconds, making the cold start issue even worse.

#serverless

S3 or DynamoDB?
1.25 GEEK