The Worst Computer Bugs in History is a mini series to commemorate the discovery of the first computer bug seventy years ago. Although these stories are more extreme than most software bugs engineers will encounter during their careers, they are worth studying for the insights they can offer into software development and deployment. These computer bugs left a significant impact on the people who experienced them, and we hope they’ll offer valuable lessons we can all apply to our own work and projects. Read about other computer bugs in the series: The Ariane 5 Disaster, the Mars Climate Orbiter, and Therac-25.

Knight Capital Group

On August 1st, 2012, Knight Capital deployed a new software update to their production servers. At around 08:01AM, staff in the firm received 97 email notifications stating that Power Peg, a defunct internal system that was last used in 2003, was configured incorrectly.

This was the first warning sign.

The damage

At 09:00AM, the New York Stock Exchange opened for trading, and Knight Capital’s first retail investor of the day placed an instruction to buy or sell their investment holdings.

Just 45 minutes later, Knight Capital’s servers had executed 4 million trades, losing the company $460 million and placing it on the verge of bankruptcy. Some shares on the NYSE shot up by over 300%, as High Frequency Trading algorithms from other firms exploited the bug.

Ultimately, Knight Capital was fined an additional $12 million by the Securities Exchange Commission, due to various violations of financial risk management regulations.

What went wrong

Stock Exchanges 101

A stock exchange works by pairing an individual who wants to buy a stock, with another individual who wants to sell that stock. The seller will state an ask price, which is how much they are willing to sell it for, and the buyer will state a bid price, which is how much they’re willing to pay. The difference between these two values is known as the bid/ask spread, and typically hovers around a few cents.

Conventional financial wisdom also states that an individual should buy shares when the price is low, and sell when the price is high.

Unfortunately, an algorithm deployed on one of Knight’s production servers was designed to do exactly the opposite of that, as fast as possible. The Power Peg program was designed to buy a stock at its ask price, and then immediately sell it again at the bid price, losing the value of the spread. Although a few cents may not seem like much, when a computer is executing thousands of trades a second, it quickly adds up to catastrophic losses.

Buy high, sell low

In a test environment, Power Peg would drive up the price of stocks, allowing QA to verify that other features of the software was working correctly.

In a production environment, Power Peg would result in C-Level employees spending their weekends frantically searching for Wall Street Investors to cover a multi-million dollar black hole that had just appeared in the budget.

Cause of failure

The cause of the failure was due to multiple factors. However, one of the most important was that a flag which had previously been used to enable Power Peg was repurposed for use in new functionality. This meant the program believed it was in a test environment, and executed trades as quickly as possible, without caring about losing the spread value.

Secondly, the dev-ops team failed to deploy the updated program to one of the eight production servers. This server now had an outdated version of the software, and a flag stating that Power Peg should be enabled.

Finally, Power Peg had been been obsolete since 2003, yet still remained in the codebase some eight years later. In 2005, an alteration was made to the Power Peg code which inadvertantly disabled safety-checks which would have prevented such a scenario. However, this update was deployed to a production system at the time, despite no effort having been made to verify that the Power Peg functionality still worked.

Dead code

Many other factors were highlighted by the Security Exchange Commission’s report. The most damning factors are that there was no formal code review or QA process, and that processes were not in place to check that software had deployed correctly. Additionally, no pre-set capital thresholds were in place to stop automated trading after a certain amount of losses.


Our series on the Worst Software Bugs in History is in honor of Bug Day 2017. Seventy years ago, Grace Hopper discovered the first computer bug — a moth was stuck between relays in the Harvard Mark II computer she was working on. The notion of bugs was described in other fields previously, but the moth discovery was the first use of the term “debugging” in the field of computers.

Bugsnag automatically monitors your applications for harmful errors and alerts you to them, giving you visibility into the stability of your software. You can think of us as mission control for software quality.


Sources:
http://www.theregister.co.uk/2012/08/03/bad_algorithm_lost_440_million_dollars/
http://www.zerohedge.com/news/what-happens-when-hft-algo-goes-totally-berserk-and-serves-knight-capital-bill
https://www.sec.gov/litigation/admin/2013/34-70694.pdf