Saturday, November 11, 2023

Exception handling claims and counterpoints

Unsupported claim: The distinction between (unrecoverable) bugs/panics and recoverable errors is incredibly useful
Counterpoint: Most languages make no such distinction. The distinction is subjective and depends on the context - one person's "unrecoverable error" is another person's "must recover". For example, say we have a service like Repl.it - user code running in threads running on a server. For expressiveness, the user code should be able to fail by throwing an exception. But that shouldn't take down the entire thread (unless it needs to), and it should never take down the entire server (running thousands of different users' threads). Most exception systems here would only give a false of safety: they allow some construct for recoverability (like try-catch or the ? operator), but then they allow exit() or some kinds of errors (OOM, divide by zero, ...) to take down the entire system. This is unacceptable. For example, H2O handled OOM in Java and generally recovered nicely - if OOM was unrecoverable, this couldn't be done. As soon as you allow any sort of unrecoverable bugs/panics, then one unvetted library or untrusted user code snippet can take down the entire system. The conclusion: unrecoverable panics shouldn't exist. A true exception system needs to actually take recoverability seriously and allow every single error in the entire software stack to be recoverable at some level of containment.

Unsupported claim: Rust users over-estimate how often they need code that never panics
Counterpoint: A lot of software is mission-critical and absolutely has to keep going no matter what, like the server example mentioned previously. A server that constantly crashes is a useless server compared to one that logs the error and keeps going. In contrast, there is never a time when a Rust user said "Gee, my program kept running. I really wish it had crashed with a panic instead of catching the exception and printing a stack trace" - such a user would just add an exit() to their exception handler.

Unsupported claim: Every error in a high-assurance system (think pacemakers/medical devices, critical automotive systems like break control, flight control engine, nuclear device release control software) has to be a hard compile-time error. You absolutely should not be able make "mistakes" here.
Counterpoint: It is very useful during development to be able to see all the errors in a program at once. If a compiler stops on the first error, this is impossible. Similarly, some errors in programs are only visible by actually running them. If programs under development cannot be run due to not passing one or more quality checks, this means a developer will have to waste time fixing compiler errors in code that may end up being deleted or completely rewritten anyway. If there is a valid semantics to the program, the compiler should compile the program with this semantics and not get in the developer's way. Certainly for the final production-ready binary, it is worth addressing all the quality checks, but not necessarily before then. It is unwise to encode quality control processes at the technical level when organizational policies such as commit reviews, continuous integration testing (which will include quality checks), and QA teams are so much more effective. Similarly to how one person's "must-recover" exception may be another person's "unrecoverable error", one person's "hard compile-time error" may be another person's "useless noise", and policies on such errors can only be dictated by the organization.

Unsupported claim: Most projects don't configure their compiler warning flags properly. Warnings are not enough for mission-critical--has to be hard compile-time error.
Counterpoint: Per https://devblogs.microsoft.com/cppblog/broken-warnings-theory/, in Microsoft's Visual Studio Diagnostics Improvements Survey, 15% of 270 respondents indicated they build their code with /Wall /WX indicating they have a zero tolerance for any warnings. Another 12% indicated they build with /Wall. Another 30% build with /W4. These were disjoint groups that altogether make 57% of users that have stricter requirements to code than the default of Visual Studio IDE (/W3). Thus, there is a majority of users that definitely configures their warning flags, and likely there are more that simply left the flags at the default after determining that the default satisfied their needs. If this is not the case, it is easy enough to make it mandatory for at least one compiler warning level flag to be specified. The NASA/JPL rules specify to enable all warnings.

Unsupported claim: Guaranteeing all possibilities are handled is not something a general purpose language can handle. We'd have some sort of type system capable of expressing everything. This in turn would require some form of advanced dependent type checking, but such a type system would likely be far too complex to use to be practical. You'll only get 5 users, 4 of which work on proof assistants at INRIA.
Counterpoint: Ada is/was a general-purpose language, and Spark is a conservative extension of Ada that adds high-assurance capabilities without changing the underlying language. Many practical projects have been implemented with Ada and with Ada/Spark. And adding a basic analysis that determines what exceptions a function can throw and whether these are all handled is easy (it is just Java's checked exception system).

Unsupported claim: "Yorick's shitty law of error handling": Mixing guaranteed handling and abort-on-error into the same language is likely to result in users picking the wrong approach every time, and a generally confusing mess. You can't make an error handling mechanism that fits 100% of the use cases, rather you can either target the first 90%, or the remaining 10%, with the latter requiring an increasingly difficult approach the more of that chunk you want to cover
Counterpoint: This is exactly what Java did with its checked and unchecked exceptions, and nobody was confused. Annoyed, yes (at the dichotomy), but not confused. And in my approach there is actually not a checked/unchecked distinction - it is only the function signature that controls whether an exception is checked, so it is even less likely to result in confusion or annoyance. Don't want to bother with checked exceptions? Don't mention them in the function signature.

Unsupported claim: From the point of view of the code that says throw, everything is a panic.
Counterpoint: There are actually several ways to handle an exception (traditional exception, assertion failure, sys.exit call, etc.):

  • warn: compile time error/warning
  • error: crash the program (exit, infinite loop, throw exception)
  • lint: log failure and keep going (banned at Bloomberg)
  • allow: (in release mode) keep going as though the throw call never existed

Supported claim: Java checked exceptions have issues https://testing.googleblog.com/2009/09/checked-exceptions-i-love-you-but-you.html
Counterpoint: The issues are (1) long throws clauses (2) exception-swallowing traps that rethrow as an unchecked exception (3) unreachable exceptions. Exception sets solve the issues of long throws clauses - Java was trying to do that with the inheritance hierarchy but I think being able to have overlapping, customizable sets of exceptions will make a huge difference in usability. Allowing type signatures to skip over exceptions (similar to unchecked exceptions) avoids the need for exception swallowing. For unreachable exceptions there is a specific pragma for the compiler that an exception is unreachable and to not emit any warnings about it.