Google Code Jam 2014 Repeated Email Incident
Google · Code Jam email system
On an unspecified date, the automated email system for Google Code Jam experienced an incident where it repeatedly sent “Registration Now Open for Google Code Jam 2014!” emails. A large number of registrants from 2013 received more than 20 copies of this email. The issue was identified and the system was manually stopped after a contestant alerted Google.
The incident stemmed from a bug introduced during a refactoring of the mail system. The system used an App Engine datastore with “notification” objects having a “status” property (“Waiting”, “Sending”, “Sent”). A cron job, MailCheckWorker, was designed to find “Waiting” notifications and initiate email sending, marking them “Sent” only after all emails were dispatched.
The core problem was a non-atomic operation. The MailCheckWorker did not transition the notification status to “Sending” before starting the email dispatch. Consequently, while the first batch of emails was still being sent, the MailCheckWorker would re-evaluate the notification, find it still in “Waiting” status, and initiate another round of email sending. This cycle repeated multiple times.
This flaw led to a significant customer impact, with many users receiving numerous identical emails. Smaller-scale tests of the mail system had not detected this bug because they completed within a minute, before the MailCheckWorker could re-process the “Waiting” notification. The Code Jam team issued an apology for the inconvenience caused.