Thursday, May 22, 2008

Could the Credit Crunch be a software fault?


The Financial Times is reporting that the credit rating agency Moody’s has uncovered a bug in the software they use to value credit derivatives. As a result some instruments rated 4 notches higher than they should have been. And Moody’s shares are down 15% as a result.

Think about this. If these instruments had the correct rating people would have put a lower value on them to reflect the risk. And banks would have had a different value at risk so maybe they would have seen problems coming sooner.

How come Moody’s didn’t notice this sooner? Well it seems they did but they didn’t want to face up to it - management not wanting to hear what engineers say? - Sounds like the Challenger disaster again.

What happened to code reviews? What about Unit testing?

Let me guess, ‘this is the real world, we don’t have time for that’. The financial industry is build on software but banks haven’t really internalised that yet. The Moody’s bug is not an isolated instance.

Story 1, I was working at an investment bank in the late 1990’s. I was working on a risk system with the quants. They had branched the code base from the main developers months before - when I say branched, I mean it was put on another server so there was no way back.

The bank was silly enough to loose its development team - five of them left within a few weeks of one another. I was asked to take over the main code base and support the main overnight risk assessment system. There was a list of bugs they wanted fixing, most of which I’d fixed in the other code base, but before I could do so I needed to build the live system.

The two code bases on two different servers were different, not just diverged but contradictory. The code left on the developers machines was different again. And being C++ there were a lot of #ifdef’s and I couldn’t tell which ones were on when the live system was built.

I had no way of building the system to produce the same numbers as the live one.

I asked for help but nobody really wanted this news. I could build a system, several versions of it in fact, but none of them produced the same figures as the live system. And nobody had any time to help me work out what the figures should be, or even tell me how I could work them out.

The bank’s risk analysis software was effectively a random number generator.

Story 2, I heard this at SPA this year from “Bill” - a pseudonym. I think it was a derivatives or risk system but I’m not sure. Bill had been examining some figures one evening and they didn’t look right. The more he looked the more wrong they looked.

Bill found a developer and together they looked at the code. And in the code there was a sign error. Numbers which should be positive were negative and vice versa. They made a fix, and reported up. It went all the way up the chain and fast. The fix was signed off and in production very quickly.

I didn’t find out if the bank ever reviewed the numbers, or how long the error was live but, it was quite possible the bank was insolvent on several occasions without knowing it.

Moral? Increasingly our financial markets are built on systems which are not only difficult too understand but which we known contain errors. So what happens now?

Most likely Moody’s will survive - even if their shares fall. The moment the word “code” is used management will tune out, nobody wants this news so lets move on.

But if it is just possible the SEC, FSA and other regulators will get involved. This could be serious and they might demand something is done about it. Not just at Moody’s but elsewhere - clean up financial systems!

Now this is a more interesting scenario because two things might happen.

Firstly this could make Sarbanes Oxley regulation look like a walk in the park. What if the SEC introduce their own coding standards? What if all software needs to be formally approved?

This route would be a field day for the CMMI and ISO guys, the high priests of formal methodology and for document management systems. They would have so much work... and banks software would grind to a halt.

It would also be good news for vendors of shrink wrapped software. If developing your own costs so much more, and gets you tied in regulation then it is no longer cost effective so buy, don’t build.

There is another option, far less likely I’ll admit but...

What if you had an audit trail? You could build the system for any date at the push of a button.

What if code reviews did happen? Or what if code was pair programmed?

What if all source code was unit tested? And the tests kept? And the tests re-run regularly?

What if writing code became so difficult, expensive and risky we had to reduce our use of code? Well then we’d have to really code the right thing - improve our requirements.

Do you see were I’m going? Improving our code base, even regulating it, doesn’t have to be SarBox all over again. Many of the Agile methods are surprisingly well suited to this.

We will see. In the meantime, next time someone tells you you are wasting time writing a unit test remind them that Moody’s lost 15% of its value because of a bug and may have endangered a fair few banks.