GitHub availability incidents May 9-11, 2023
GitHub · Git databases, GitHub App authentication
On May 9, 2023, GitHub experienced a Git database degradation. This was triggered by a configuration change intended to prevent connection saturation. Shortly after rollout, the cluster failed over, and a subsequent rollback attempt failed due to an internal infrastructure error. This caused widespread failures, impacting 8 of 10 main services for over an hour, and led to an extended period of inconsistent pull request and push data.
On May 10, 2023, GitHub App authentication token issuance degraded. An inefficient API for managing GitHub App permissions, when invoked by a new caller that retried on timeouts, caused a 7x increase in write latency on the database cluster. This resulted in an 8-15% failure rate for auth token requests, peaking at 76%, impacting 6 of 10 main services.
On May 11, 2023, a Git database cluster crashed, leading to an automated failover. Although the primary failover was successful, read replicas were not reattached. This left the primary unable to handle the full read load, causing an average of 15% of Git data requests to fail or be slow, with a peak impact of 26%. This affected 8 of 10 main services.
These incidents collectively caused widespread degradation across GitHub services, affecting critical functions like GitHub Actions workflows, Codespaces, and GitHub Pages. GitHub is addressing these issues by reviewing internal processes, improving observability for high-cost/low-volume query patterns, resolving the underlying Git database crash, and ensuring failovers recover fully without intervention.