{"UUID":"c3fab524-b7a3-42e6-b623-47921035df27","URL":"https://discuss.circleci.com/t/post-incident-report-april-4-2025-circleci-ui-loading-build-triggering-issues/53208","ArchiveURL":"","Title":"CircleCI UI and build capabilities disruption on April 4, 2025","StartTime":"2025-04-04T00:16:00Z","EndTime":"2025-04-04T01:49:00Z","Categories":["automation","config-change"],"Keywords":["circleci","waf","iam","terraform","cloudfront","ui","builds","outage"],"Company":"CircleCI","Product":"CircleCI UI, build capabilities","SourcePublishedAt":"2025-05-02T14:20:47Z","SourceFetchedAt":"2026-05-04T19:52:28.92408Z","Summary":"An IAM-role gap permitted out-of-band changes to AWS WAF outside of CircleCI's Terraform pipeline; an operator performing what they believed were read-only investigation actions modified WAF in a way that began blocking legitimate traffic to the `api.circleci.com` and `circleci.com` CloudFront distributions. Because the change wasn't recorded in Terraform, responders deprioritized WAF as a suspect and chased CORS errors and recent deploys until automated drift detection surfaced the discrepancy.","Description":"On April 4, 2025, from 00:16 to 01:49 UTC, CircleCI experienced a service disruption affecting its user interface and build capabilities. During this period, customers were unable to access the CircleCI UI or initiate new builds. The incident began when an inadvertently applied Web Application Firewall (WAF) rule started blocking legitimate traffic.\n\nThe issue manifested as degraded performance across multiple services, a drop in GitHub webhooks, and widespread connectivity issues between the frontend and backend services, including CORS errors. Initial investigations explored various causes like recent deployments, but the root cause remained unclear for some time.\n\nThe root cause was identified as an inadvertently applied WAF rule. A misconfiguration in IAM controls allowed an operator to manually modify WAF settings outside of the standard Terraform deployment process, believing they were performing read-only actions. This change blocked legitimate traffic to `api.circleci.com` and `circleci.com` CloudFront distributions.\n\nIncident response was complicated by the assumption that all WAF changes would go through Terraform, leading responders to initially deprioritize WAF configuration as a suspect. The diverse symptoms and the incident's proximity to another unrelated issue also led to fruitless paths of inquiry. The problem was eventually identified when automated Terraform drift detection surfaced the discrepancy.\n\nTo prevent recurrence, CircleCI implemented stricter IAM policies to prevent direct infrastructure modification outside of the infrastructure-as-code pipeline. Enhancements are being made to Terraform drift detection for faster alerts, and technical guardrails are being added for configuration management. Additionally, WAF-specific monitoring and security control policies (SCPs) are being introduced to further reduce the risk of accidental misconfigurations."}