Cloud Filestore ListInstances API failed with error code 429 globally
Google · Cloud Filestore
On Tuesday, September 13, 2022, Google Cloud Filestore experienced an incident where all read-only API requests, specifically the ListInstances API, failed with error code 429 globally. This issue began at 07:38 US/Pacific and was fully mitigated by 10:42 US/Pacific, lasting for 3 hours and 4 minutes.
The root cause was identified as an internal Google service, managing a large number of GCP projects, malfunctioning and overloading the Filestore API with requests. Filestore’s global API request limit, intended to prevent overload, was triggered, leading to global throttling of the API.
As a result of this throttling, customers globally were unable to access read-only API functions for Cloud Filestore. This impacted Console, gcloud, and direct API access for operations like List and GetOperation. While mutate operations (e.g., CreateInstance, UpdateInstance) still succeeded, customers could not check their progress.
Google engineers were alerted at 07:47 US/Pacific. Once the nature and scope of the issue were understood, engineers began shutting down the overloading internal service at 10:33 US/Pacific, with the issue fully resolved by 10:42 US/Pacific.
To prevent recurrence, Google plans to remove the global quotas that caused a local overload to have a global impact. They will also improve alerting and monitoring to detect and mitigate issues more promptly when internal limits are approached.