Performance Monitoring for AWS Lambda

Monitoring the performance of Lambda functions might seem like a trivial task but once the dataset gets larger, it becomes increasingly harder to understand how your users experience the system. As a developer, you usually care about the latency, and cost of your system. The features of a good observability tool should be aligned with all that while also enabling you to ask arbitrary questions about your system to figure out the scope and causes of problems. Let’s go into detail how one should approach performance monitoring and figuring out the root causes of performance problems of Lambda functions.

Performance monitoring for Lambda functions

Let’s start with what you should monitor in Lambda functions. In general there are two areas - user experience and the cost of the system. User experience usually comes down to availabilitylatency and feature set of a service while to cost of operating a service is important to ensure the profitability of the business. In distributed architectures, the surface area of what to monitor becomes larger and changes in performance and cost can often slip through unnoticed.

One of the contributing factors that make serverless applications harder to monitor is the setup overhead of analytics services. In most cases with serverless, there’s a lot more units to monitor, the lifecycles are short and configuring agents directly contribute to latency and cost.

The good thing about such services is that by default, they make themselves observable. Observability does not mean that you have visibility, it means that the systems emit data that makes it possible to understand what is happening from the outside. This is the core principle we built Dashbird on.

Observing the cost of Lambda functions

Depending on the metric, it might make sense to observe it across all functions or individually per resource. For example, cost of the system is best to keep an eye on at the account level and only if that metric experiences a significat change does it make sense to drill down to function level.

Dashbird Lambda profile

Cross-account access to invoke AWS lambda using AWS CDK

If you are here, you may have a pretty good knowledge of how to use AWS CDK for defining cloud infrastructure in code and provisioning it through AWS. So let’s get started on how to grant permission to your lambda function to access the resources in another AWS account.

Let’s say you have two accounts called Account A and Account B, and you need to give permission to lambda function in Account A (ex: 11111111)to access the resources in Account B(22222222). You can easily do this by assuming an IAM Role in Account B and then uses the returned credentials to invoke AWS resources in Account B.

Cache secrets using AWS Lambda extensions

What is the AWS Lambda extension?

A month back AWS announced a preview of Lambda Extensions, a new way to easily integrate Lambda with your favorite monitoring, observability, security, and governance tools. Extensions can be published as Lambda layers, there are two types are extension:

  • Internal extensions → Run as part of the runtime process, in-process with your code. Internal extensions enable use cases such as automatically instrumenting code.
  • External extensions → Allow you to run separate processes from the runtime but still within the same execution environment as the Lambda function. External extensions can start before the runtime process and can continue after the runtime shuts down. These extensions run as companion processes to Lambda functions.

AWS Lambda Performance: Main Issues and How To Overcome Them

In this article, learn about the major issues affecting AWS lambda performance to improve your project’s speed and efficiency and cut unnecessary costs!


AWS Lambda gives users powerhouse capabilities out-of-the-box. It enables web designers and creators to handle the broadest range of tasks. And yet, if you are looking to build a sturdy, smooth, and fast-running server infrastructure, the service’s standard functionality may not be enough. The good thing is that you can optimize AWS Lambda performance issues and tailor the platform to your particular needs.

Let’s find out how to actually avoid or eliminate the most relevant AWS Lambda performance issues.

Boosting AWS Efficiency via Lambda Performance Monitoring

How do you pinpoint the major issues hindering the performance of AWS Lambda? It’s all about thorough monitoring of the underlying functions. Understanding how everything works and behaves allows fine-tuning configurations to achieve the best operational results.

How to Monitor Third Party API Integrations

Many enterprises and SaaS companies depend on a variety of external API integrations in order to build an awesome customer experience. Some integrations may outsource certain business functionality such as handling payments or search to companies like Stripe and Algolia. You may have integrated other partners which expand the functionality of your product offering, For example, if you want to add real-time alerts to an analytics tool, you might want to integrate the PagerDuty and Slack APIs into your application.

If you’re like most companies though, you’ll soon realize you’re integrating hundreds of different vendors and partners into your app. Any one of them could have performance or functional issues impacting your customer experience. Worst yet, the reliability of an integration may be less visible than your own APIs and backend. If the login functionality is broken, you’ll have many customers complaining they cannot log into your website. However, if your Slack integration is broken, only the customers who added Slack to their account will be impacted. On top of that, since the integration is asynchronous, your customers may not realize the integration is broken until after a few days when they haven’t received any alerts for some time.

How do you ensure your API integrations are reliable and high performing? After all, if you’re selling a feature real-time alerting, you’re alerts better well be real-time and have at least once guaranteed delivery. Dropping alerts because your Slack or PagerDuty integration is unacceptable from a customer experience perspective.

What to monitor


Specific API integrations that have an exceedingly high latency could be a signal that your integration is about to fail. Maybe your pagination scheme is incorrect or the vendor has not indexed your data in the best way for you to efficiently query.

Latency best practices

Average latency only tells you half the story. An API that consistently takes one second to complete is usually better than an API with high variance. For example if an API only takes 30 milliseconds on average, but 1 out of 10 API calls take up to five seconds, then you have high variance in your customer experience. This is makes it much harder to track down bugs and harder to handle in your customer experience. This is why 90th percentile and 95th percentiles are important to look at.


Reliability is a key metric to monitor especially since your integrating APIs that you don’t have control over. What percent of API calls are failing? In order to track reliability, you should have a rigid definition on what constitutes a failure.

Reliability best practices

While any API call that has a response status code in the 4xx or 5xx family may be considered an error, you might have specific business cases where the API appears to successfully complete yet the API call should still be considered a failure. For example, a data API integration that returns no matches or no content consistently could be considered failing even though the status code is always 200 OK. Another API could be returning bogus or incomplete data. Data validation is critical for measuring where the data returned is correct and up to date.

Not every API provider and integration partner follows suggested status code mapping


While reliability is specific to errors and functional correctness, availability and uptime is a pure infrastructure metric that measures how often a service has an outage, even if temporary. Availability is usually measured as a percentage of uptime per year or number of 9’s.

AVAILABILITY %DOWNTIME PER YEARDOWNTIME PER MONTHDOWNTIME PER WEEKDOWNTIME PER DAY90% (“one nine”)36.53 days73.05 hours16.80 hours2.40 hours99% (“two nines”)3.65 days7.31 hours1.68 hours14.40 minutes99.9% (“three nines”)8.77 hours43.83 minutes10.08 minutes1.44 minutes99.99% (“four nines”)52.60 minutes4.38 minutes1.01 minutes8.64 seconds99.999% (“five nines”)5.26 minutes26.30 seconds6.05 seconds864.00 milliseconds99.9999% (“six nines”)31.56 seconds2.63 seconds604.80 milliseconds86.40 milliseconds99.99999% (“seven nines”)3.16 seconds262.98 milliseconds60.48 milliseconds8.64 milliseconds99.999999% (“eight nines”)315.58 milliseconds26.30 milliseconds6.05 milliseconds864.00 microseconds99.9999999% (“nine nines”)31.56 milliseconds2.63 milliseconds604.80 microseconds86.40 microseconds


Many API providers are priced on API usage. Even if the API is free, they most likely have some sort of rate limiting implemented on the API to ensure bad actors are not starving out good clients. This means tracking your API usage with each integration partner is critical to understand when your current usage is close to the plan limits or their rate limits.

Usage best practices

It’s recommended to tie usage back to your end-users even if the API integration is quite downstream from your customer experience. This enables measuring the direct ROI of specific integrations and finding trends. For example, let’s say your product is a CRM, and you are paying Clearbit $199 dollars a month to enrich up to 2,500 companies. That is a direct cost you have and is tied to your customer’s usage. If you have a free tier and they are using the most of your Clearbit quota, you may want to reconsider your pricing strategy. Potentially, Clearbit enrichment should be on the paid tiers only to reduce your own cost.

How to monitor API integrations

Monitoring API integrations seems like the correct remedy to stay on top of these issues. However, traditional Application Performance Monitoring (APM) tools like New Relic and AppDynamics focus more on monitoring the health of your own websites and infrastructure. This includes infrastructure metrics like memory usage and requests per minute along with application level health such as appdex scores and latency. Of course, if you’re consuming an API that’s running in someone else’s infrastructure, you can’t just ask your third-party providers to install an APM agent that you have access to. This means you need a way to monitor the third-party APIs indirectly or via some other instrumentation methodology.

What AWS Lambda Metrics Should You Definitely Be Monitoring?

This article outlines the crucial AWS Lambda metrics you should definitely be monitoring from cost to performance and responsiveness, here’s what to focus on.

Your application does not need to be “huge” for it to have enough functions and abstraction to get lost in it. As a DevOps engineer, you can’t cover every single factor. Showing relevant facts and asking the right questions is crucial! So, when there’s a fire, you can troubleshoot in no time.

Every organization is unique, and every workload has its own utility. That said, we can still have a generalized approach, and we can start by listing a few desirable qualities that you may want from your AWS application:

  • Performance
  • Responsiveness
  • Cost-effectiveness

AWS Lambda Pricing

Lambda pricing is very straightforward, and the billable factors include:

  • Number of requests
  • Compute time
  • Amount of memory provisioned

Compute time and memory provision are coupled. We’ll mention this in more detail further below. Let’s start with the number of requests. The first one million requests are free every month. After that, you will be charged $0.20 per million requests for the remainder of that month. That’s stupid cheap.

