Wednesday, January 09, 2019

Error handling omitted for brevity

throw-2019-01-9-15-34.jpg
Q: What is the difference between programming in college and programming in the real world? A: Error handling
Do you remember when you were learning to program? Do you remember those text books you had back in college? And do you remember what they said about error handling? As I remember it most of what they said about error handling was:         /* error handling omitted for brevity */ Or perhaps:         (* error handling omitted for brevity *) Back in college error handling hardly got a mention, and if it did it was to abort the program. Yet in the real world 80% of what you program is error handling, or rather exceptions, the corner cases, what happens when things go wrong. I’ve been saying this for years but this week I realised how shocking this was. A couple of years ago a paper entitled “Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems” (2014, you know its an academic paper because it has 8 authors)) was momentarily famous on Twitter. I grabbed it and had a quick read but this week I had reason to go back and look at it again. In the process I found a 20 minutes video presentation by one of the authors. To cut a long story short, the authors looked at the source code for large open source applications (Cassandra, MapReduce, etc) and software failures. Among various finding they reported:
  • Finding 1: “A majority (77%) of the failures require more than one input event to manifest, but most of the failures (90%) require no more than 3” - so even if didn’t happen very often, they were difficult to simulate in system testing
  • Finding 9: “A majority of the production failures (77%) can be reproduced by a unit test.” (Yes the reoccurrence of 77% is suspicion but I think it is an improbably but genuine co-incidence, please read the paper or watch the video before you fault the paper on this.)
  • Finding 10: “Almost all catastrophic failures (92%) are the result of incorrect handling of non-fatal errors explicitly signalled in software.”
  • Finding 11 “35% of the catastrophic failures are caused by trivial mistakes in error handling logic — ones that simply violate best programming practices; and that can be detected without system specific knowledge.”
The authors even created a tool to scan code for some of these problems. In many cases they found code like:
catch (…) {         // TODO } catch (Exception e) {         /* will never happen */ }
My old jibe about error handling looked very real. This morning I pulled some old books off my shelves and was shocked by what I found: First the book I was prescribed at not one but two University programming courses: “Problem Solving and Structured Programming in Modula-2” by Elliot B. Kaufman (1988). I can’t find “Error handling omitted” in this book, my memory was wrong but the book is worse. I can’t find any error handling to speak of! I found one example which returns a boolean success/fail flag but there is no discussion of what to do with it. “Error handling” is not even in the index, let alone the table of contents - actually “Error” isn’t even there. Each chapter ends with a “Common Programming Errors” section but this section is mostly about compile time errors. Next I looked at the silver book, Wirth’s “Pascal User Manual and Report” (1991). I can only find two references to “errors” (nothing to exception). Both these references are in the report section and don’t say anything about how to program error handling. As I looked at more old books I noticed how they just assumed everything worked well. K&R is slightly better - “The C Programming Language” by Kernighan and Ritchie (1988) that is. Most of the examples here do check for errors, then printf. Sometimes that is it, sometimes there return 0 or break. On page 164 they say:
“We have generally not worried about exit status in our small illustrative programs, but any serious program should take care to return sensible, useful status values.”
In other words: Error handling omitted for brevity. Moving away from the introductory books I turned to what might be the longest single volume technical book I ever read. A book I quoted as a bible, a book who’s author I still put on a pedestal: “Large Scale C++ Software Design”, John Lakos (1996). While John does say a bit more about error handling it does not feature in the index and there is no dedicated section to it. Looking at it now I am in disbelief, how could a book a large scale C++ not have at least one chapter on error handling? Of the books I look at this morning only Kernighan and Pike’s “Practice of Programming” (1999) gave any coverage to error handling. And that isn’t saying much. OK, these are all ancient books. Have things changed? - you tell me. I hope more recent books, in more modern languages have got better - and my old (1999) copy of “Learning Python” (Ascher) contains a whole chapter on exceptions as does Stroustrup’s “C++ Programming Language” (2000). But I am sure error and exception handling hasn’t got any simpler. I can’t believe that JavaScript, PHP, Swift, and simiar. have somehow made the problem go away. “Throw exception(blah, blah, blah)” might be a great improvement over “return -1” but I can’t imagine handling these cases has got easier. Based on the “Simple Testing” paper improvements in training programmer in error handling need to be redoubled.

Like this post?

Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”