Postmortem Index

Explore incident reports from various companies

Reddit

Outage for over 5 hours when a critical Kubernetes cluster upgrade failed. The failure was caused by node metadata that changed between versions which brought down workload networking.

Source | Edit Metadata | JSON file