Postmortem Index

Explore incident reports from various companies

Google Cloud Networking packet loss May 2022

Google · Google Cloud Networking

On Friday, May 20, 2022, Google Cloud Networking experienced intermittent packet loss for traffic between multiple cloud regions for approximately 20 minutes, from 13:47 to 14:07 US/Pacific. This incident affected various Google Cloud services, including Cloud VPN, Cloud Router, Cloud Interconnect, Google Cloud Storage, Cloud SQL, and Messages, leading to issues such as elevated latency, errors, and service availability problems.

The root cause was identified as a failure of a component on a fiber path originating from one of the central US gateway campuses within Google’s production backbone. This failure resulted in a decrease in available network bandwidth between the gateway and multiple edge locations. The network topology in this specific region was undergoing augmentation, and the second-best path was not yet fully augmented, forcing some traffic to reroute onto a less optimal third-best path, which extended the traffic migration period.

Google’s automated repair mechanisms detected the bandwidth decrease and automatically routed traffic through alternate links. The traffic rerouting completed by 14:06 US/Pacific, mitigating the issue without manual intervention. While automated systems worked as intended, the scope of impact was more severe than anticipated due to the network’s state.

Remediation efforts include completing the network augmentation in the affected region by July 11, 2022, to ensure that any single fiber path failure can be quickly rerouted onto the second-best path. Google is also building automatic analysis to ensure that network topology during augmentation always supports fast convergence and continues to optimize its global network to minimize convergence time during failures.

Keywords

google cloud networkingpacket lossnetwork backbonefiber pathus centralcloud vpncloud storagecloud sql