Knight Capital SMARS deployment incident
Knight Capital · SMARS
On August 1, 2012, Knight Capital’s SMARS order router accumulated a $460 million trading loss in roughly 45 minutes after the market opened, effectively bankrupting the firm.
In the weeks prior, Knight had been preparing for the launch of NYSE’s Retail Liquidity Program. The required code went into SMARS — Knight’s high-speed algorithmic order router — and reused a configuration flag that in an older codebase had activated a separate, retired feature called Power Peg. Power Peg had been disabled in 2003 but the code remained present and callable in SMARS. A 2005 refactor moved the cumulative-shares tracking out of Power Peg’s hot path, and from that point Power Peg, if ever invoked, would no longer satisfy its termination condition.
The deployment to eight SMARS servers between July 27 and August 1 was performed manually by a single technician with no peer review. Seven servers received the new RLP code; the eighth did not. When the market opened on August 1 and parent orders arrived carrying the repurposed flag, the eighth server interpreted the flag as the legacy Power Peg signal and invoked the dead code path. Without the cumulative-shares tracker that had been moved seven years earlier, the Power Peg routine never terminated and continued issuing child orders without bound.
Beginning at 08:01 ET, an internal Knight system began emitting “BNET reject” emails referencing SMARS and a “Power Peg disabled” error. 97 such messages went out to a Knight distribution list before the 09:30 ET market open. They were not formal alerts and were not reviewed in time.
Knight had no documented incident-response procedure. While SMARS continued to send millions of orphaned child orders into the market, technologists triaging the problem uninstalled the new RLP code from the seven correctly-deployed servers — which made things worse, because that activated Power Peg on those servers too. It took roughly 45 minutes to halt the runaway position. The SEC ultimately fined Knight $12 million; the firm was acquired shortly after.
Contributing factors included a deployment process that did not verify cluster-wide version consistency, no monitoring to catch the release mismatch, warning emails that were not treated as actionable alerts, long-dead code paths left reachable in production, and the reuse of a flag previously bound to that dead path.