Postmortem Index

Explore incident reports from various companies

GitHub Copilot degradation on July 13, 2024

GitHub · github copilot

On July 13, 2024, between 00:01 UTC and 19:27 UTC, the GitHub Copilot service experienced significant degradation. This incident lasted for 19 hours and 26 minutes, affecting various Copilot functionalities.

During the incident, the error rate for Copilot code completions reached 1.16%, while GitHub Copilot Chat experienced a peak error rate of 63%. Customers using Copilot may have encountered delays, errors, or timeouts. Additionally, GitHub code scanning autofix dropped suggested fixes between 00:01 UTC and 12:38 UTC, and subsequently delayed suggested fixes until 21:38 UTC.

The root cause was identified as a scheduled resource cleanup job executed by a partner service. This job mistakenly targeted a resource group containing essential Copilot infrastructure, leading to the removal of critical resources.

To mitigate the impact, GitHub rerouted Copilot Chat traffic between 01:00 and 02:00 UTC, which successfully reduced Copilot Chat error rates to below 6%. The erroneous cleanup job was stopped in time to preserve some resources, allowing GitHub to progressively restore services.

Moving forward, GitHub is collaborating with partner services to implement safeguards against similar incidents. They are also enhancing their traffic rerouting processes to expedite future mitigation efforts and minimize customer impact.

Keywords

githubcopilotcopilot chatcode scanningautofixpartner servicedegradationresource cleanup