{"UUID":"aa0709c9-90b6-4b4b-9d6e-1a70fcc27ce3","URL":"https://github.blog/news-insights/company-news/addressing-githubs-recent-availability-issues/","ArchiveURL":"","Title":"GitHub availability incidents May 9-11, 2023","StartTime":"2023-05-09T00:00:00Z","EndTime":"0001-01-01T00:00:00Z","Categories":["automation","config-change","security"],"Keywords":["availability","git","database","github apps","authentication","failover","configuration","degradation"],"Company":"GitHub","Product":"Git databases, GitHub App authentication","SourcePublishedAt":"2023-05-16T18:33:22Z","SourceFetchedAt":"2026-05-04T19:52:22.680943Z","Summary":"May 9: a connection-saturation config rollout to the Git database triggered a failover; the rollback then failed due to an internal infrastructure error, causing \u003e10h of degraded pull-request/push consistency. May 10: an inefficient GitHub App permissions API endpoint with a retry-on-timeout caller produced a 7× write-latency spike on the auth-token cluster, peaking at 76% token-issuance failure. May 11: a Git database crash auto-failed-over but the read replicas weren't reattached, leaving the primary unable to serve full read load.","Description":"On May 9, 2023, GitHub experienced a Git database degradation. This was triggered by a configuration change intended to prevent connection saturation. Shortly after rollout, the cluster failed over, and a subsequent rollback attempt failed due to an internal infrastructure error. This caused widespread failures, impacting 8 of 10 main services for over an hour, and led to an extended period of inconsistent pull request and push data.\n\nOn May 10, 2023, GitHub App authentication token issuance degraded. An inefficient API for managing GitHub App permissions, when invoked by a new caller that retried on timeouts, caused a 7x increase in write latency on the database cluster. This resulted in an 8-15% failure rate for auth token requests, peaking at 76%, impacting 6 of 10 main services.\n\nOn May 11, 2023, a Git database cluster crashed, leading to an automated failover. Although the primary failover was successful, read replicas were not reattached. This left the primary unable to handle the full read load, causing an average of 15% of Git data requests to fail or be slow, with a peak impact of 26%. This affected 8 of 10 main services.\n\nThese incidents collectively caused widespread degradation across GitHub services, affecting critical functions like GitHub Actions workflows, Codespaces, and GitHub Pages. GitHub is addressing these issues by reviewing internal processes, improving observability for high-cost/low-volume query patterns, resolving the underlying Git database crash, and ensuring failovers recover fully without intervention."}