AWS Lambda Service Event in Northern Virginia (US-EAST-1) Region on June 13th, 2023
Amazon · AWS Lambda
On June 13th, 2023, starting at 11:49 AM PDT, customers in the Northern Virginia (US-EAST-1) Region experienced increased error rates and latencies for AWS Lambda function invocations. Synchronous Lambda invocations began to recover by 1:45 PM PDT, and all affected services had fully recovered by 3:37 PM PDT.
The incident was triggered when the Lambda Frontend fleet, scaling in response to normal daily traffic growth, crossed a previously unreached capacity threshold within a single cell. This activated a latent software defect, causing Lambda Execution Environments to be successfully allocated but never fully utilized by the Frontend.
This degradation in Lambda function invocations led to increased error rates and latencies in several other AWS services. These included Amazon STS, AWS Management Console, Amazon EKS, Amazon Connect, Amazon EventBridge, and AWS Support Center. Customers experienced issues such as failed calls, chat/task initiation failures, sign-in problems, and console unavailability.
As an immediate mitigation, engineers identified the defect and scaled down the Lambda Frontend fleet to a level that no longer triggered the issue. To prevent recurrence, the scaling activities that caused the event were disabled, and the latent bug was resolved and deployed across all regions. Additionally, a gap in Lambda’s cellular architecture for Frontend scaling was identified, with immediate actions taken and a larger effort planned to ensure cells are bounded to well-tested sizes.