06 December 2010

On Exception Management

This is a gathering of thoughts on exception philosophy and general management, including handling, propagation, and throwing.

The biggest problem with the management of checked and unchecked exceptions is the programmer. Libraries are written by different people with different preferences and idioms, and our own "perfect" code is blighted by poorly written libraries we are forced to use for one reason or another. Get over it, and make your code as correct as you are able. Handle cases in poorly written dependencies as well as you are able. 

Above all, the job of a software architect/developer/designer is to think. So don't be mindless about your work, be it a script to do a one-time job or something that has lives depending on it. Programming well with checked exceptions requires some forethought and restrictions, just as with Object#notify and Object#wait, and any other modules you use.

Towards helping programmers think about how they use exceptions, in raising and handling them, here is a post regarding the topic which presents a philosophy for exception management that can be used to allow a team to reach a consensus on the usefulness and eventual need for checked exceptions. Keep in mind that checked exceptions are not for all cases - they have their own place - but they should be used to solve the need for which they as a tool were designed.

The core of this discussion revolves around the notion of two categories of exceptions: Faults and Contingencies

try/catch and Exception Handling
The point of throwing and catching exceptions is to separate the error handling code from the main business logic (See "The One True Path" section). Sometimes these exceptions are handled one-off from where they are thrown, and sometimes they propagate farther to be handled. However, it should not be assumed that all exceptions are handled in central locations. The real answer for exception handling is "it depends".

It is certainly true that the closer in scope to a raised exception the code is the more context information is available. It is not likely that all context is known by the lowest level (i.e. the block throwing the exception), but it is likely that the lowest level knows of some context that will be lost if the exception is simply propagated. There is also a chance that the lowest level can adapt and choose a secondary course of action before notifying the caller of failure, so always simply propagating is not the answer.

Furthermore, it is not the case that exception handling must be done "far" from the source. The real intent of the try/catch is to separate exception handling from business logic instead of littering business logic with error handling. This is the same principle that advocates declaring variables closer to where they are used instead of only at the start of blocks. It is a convenience that exceptions can be handled at another place in the call stack and not a mandate.

Declare and throw a checked exception if you intend the caller to either recover or propagate by meaningfully reclassify the error with increased context to its caller.

Fault vs Contingency
First, some definitions from another article upon which I am building.

  An expected condition demanding an alternative response from a method that can be expressed in terms of the method's intended purpose. The caller of the method expects these kinds of conditions and has a strategy for coping with them. Maps to a checked exception

  An unplanned condition that prevents a method from achieving its intended purpose that cannot be described without reference to the method's internal implementation. Maps to a runtime exception

(From Effective Java Exceptions, by Barry Ruzek, 10 Jan 2007)

Another related definition,
Fault Barrier
  A try/catch block at a strategic point in a call hierarchy with a single catch clause for a root exception type which deals with the exception in a uniform way, such as opening an error message dialog or logging a message for a developer or system maintainer.

Now some implications of these definitions, or clarifications of uses of checked and unchecked exceptions

  • Faults exist - deal with them instead of ignoring them
    • This is the core reason why checked exceptions exist - so a programmer must deal with or explicitly ignore known problems (identified by the library designer)
  • Faults are unrecoverable, but only to the point of the activity encountering the fault, which is where the fault barrier should be placed. 
  • Faults contain diagnostic information to help post-mortem analysis and describe what happened to help someone (i.e. a developer or system maintainer) figure out why and fix it. Faults do not contain state information to help with recovery (that would be a contingency) 
  • Faults occur as implementation details and are typically abstracted within class methods (e.g. an "Account" object's user does not know it is making database calls or file I/O). Therefore, a checked exception thrown by implementation-specific libraries may (should?) be re-thrown as a fault (runtime exception) if it is unrecoverable by the caller according to the contract of the current method. 
  • Installing fault barriers improves clarity and maintainability, and helps prevent littering code with 1-off fault handling; obviously handling of contingencies should be done 1-off or propagated since it is a checked and "known issue" 
  • You must account for exceptions to be thrown as part of a resource acquisition-release cycle, so ensure all resources are properly guarded with try/finally blocks.
System vs User Interface
C# has no checked exceptions, but Java does. Why? It is a question not only of language philosophy, but the intent of the language itself. Java is intended to be a general purpose systems language, and C# is (realistically) a VB replacement mainly used to write event-driven UI applications. It boils down to this:

  • Typically in systems development, more of the code is based on "deterministic" functionality.
    • Almost all of the intent of the developer is in the code and so they know what to do when a problem occurs whether it is a fault or a contingency and they have put it into the code.
    • Therefore, they are more able to identify and handle specific contingencies because the intent and contingency are both explicit in the code. 
  • In event-driven UI development, more of the code is based on user interaction ("non-deterministic").
    • The developer must infer intent during development, well before the user supplies it, and must handle problems "before" they occur. 
    • A (probably) higher percentage of problems become faults and propagate to a fault barrier which alerts the user - the only one who can really decide on a contingency plan. 
    • Some problems can be handled without propagating to a fault barrier, such as validating input, but typically are still handled in such a way that they notify the user to determine the contingency.
Keeping these points of philosophy in mind, it makes sense that in UI code there are more cases of try/finally with a fault barrier to allow the call stack to clean up resources and notify the user while in system and library code there are more declared checked exceptions and localized handling of contingencies.

Caller/Callee Contract
Everything with exception management revolves around the contract between the caller and the callee, which boils down to the callee's ability to execute a single method successfully and as intended with the given arguments. This paradigm scales to one (general purpose) system calling another (utility/library) system.
  • The interface between library-quality code and calling code should mainly use checked exceptions 
  • A function's return type should not be used to return an error code, such as null or a negative value when only positive values apply (String#indexOf, for an example of what not to do) 
  • Each module should define a single base checked exception type and extend all others from it to simplify declaration of checked exceptions and handling.
    • Allows specific cases defined by exception sub-types to be handled in special cases 
    • Allows the base type to be used as a definite catch-all for the entire library  (excluding RuntimeExceptions that may be thrown) 
    • Avoids the mess of needing to catch java.lang.Exception but exclude RuntimeException 
    • Only a single checked exception need be declared (the subtypes could be declared as well), but the javadoc may reflect the subtypes used 
    • The method's interface need not change for newly encountered special cases in future versions 
    • Only a single catch block is required for all checked exceptions if there are no specific contingencies to handle, e.g. a fault barrier around calling the module.
    • This pattern counters one argument against checked exceptions that the number of exceptions will explode on a method interface as it propagates implementation-specific exceptions to the caller. That is poor abstraction regarding the library's design.
  • Modules should wrap implementation details and implementation-specific checked exceptions with the module's checked exception
    • A "save data" method on a persistence module should not throw SQLException, but instead should throw a module-defined PersistenceException wrapping the implementation detail. The user of the persistence module need not know it was backed by SQL, but does need to know that exceptions may occur whatever implementation is used.
    • The stack trace (which may contain nested exceptions) containing useful context in the detail message aids in problem forensics, and particular subclasses of the module exception type can be used for cases expected to have contingencies.

  • Unchecked exceptions should be thrown for illegal state, such as an iterator's next() which throws if it is in an illegal state. Callers should first test isReady(), and then not need to even define a try/catch block. If it is indeed in an illegal state, it is a fault and should be trapped by the fault barrier.
  • As a corollary to this, one should not depend on exceptions to define expected behavior - as a library designer or user. Good libraries will be designed to allow state to be tested before making an invocation that may result in an illegal state exception, and excellent libraries will prevent computational penalties and thread safety issues for a check-then-act. Whether or not the exception is checked is a different matter because it is probably invoked from a block where there are several chances for different exceptions from the same module that may be handled together.
What to Throw
  • Checked exceptions thrown should encapsulate necessary state (not just a string message) to help calling code solve the problem, reconstruct state, or reapproach the problem
    • They are intended to define contingencies, so should offer assistance to that end to the callee 
    • They are specific types, so may implement useful methods, encapsulate complicated objects, or perform specific behavior. 
  • Unchecked exceptions thrown should encapsulate necessary state in a developer-friendly and developer-useful message, e.g. error codes, to help developers or system maintainers detect and correct problems after-the-fact
    • They are intended to encapsulate (aggregate) state information both at the point of failure and at every level from the point of failure to the fault barrier that could be of use to the developers or system maintainers. 
    • As a developer writing exception propagation code, a general rule is to add or wrap into the message all the (useful and pertinent) information that could be gained by having a breakpoint and looking through the variable values in the system execution stack. 
Use of Java RuntimeException Subtypes
Basically, use a checked exception for everything except when a problem will probably propagate all the way back to a runtime exception trap (fault barrier).
  • Rule #1 - runtime exceptions are only for faults
    • Programmer errors: check for null arguments, illegal or invalid arguments, illegal state, unsupported operations 
    • Unrecoverable errors: database is dead and not coming back, file does not exist and won't anytime soon
      • Some of these look like they are unrecoverable, but can be solved by waiting 
    • In general, you need to know what situation you are in 
  • Rule #2 - don't be a lazy programmer
    • Stop trying to avoid "work". Handling exceptions properly, or at least more than catching them and writing a comment, is work. It takes effort. It's part of why humans write code instead of monkeys or code generators.
    • If you feel lazy, at least wrap checked exceptions in a RuntimeException - it will keep you from having to manage it right now in your thought process, but at least it will propagate and get accounted for if it occurs during testing. 
  • Prefer IllegalArgumentException over NullPointerException - let the runtime throw null pointer
    • See also the many subclasses of java's runtime exception hierarchy
    • Exception type can help reduce the time needed to diagnose real problems
    • There is really no (or very little) need for application-defined subtypes of runtime exceptions because:
      • A runtime exception indicates an unrecoverable fault caught by a fault barrier, which only needs to catch the base RuntimeException and logs the message 
      • Runtime exceptions support exception chaining so can wrap any message or checked exception.
  • Document runtime exceptions in throws clauses of public library methods 
The One True Path
The One True Path is the execution sequence of code that produces no errors and achieves the expected correct result. Deviations from this path include exception handling blocks.

One problem with Java's compile-time checking of exception handling is that programmers tend to:
  1. write their code 
  2. notice a checked exception that must be handled 
  3. add a try/catch with a TODO or empty catch block and intend to handle it later 
  4. never handle it later 
One solution to this is to have the autogenerated code not add try/catch TODO, but try/catch and wrap the exception in a RuntimeException. That way it can still propagate to a fault barrier if it occurs, and the programmer can continue their thinking without being interrupted by the annoyance of writing handling code when their mind is on the "true path" code.
Here is what you should not see in code:

try {
   // something that throws a checked exception
} catch (Exception e) {
   // TODO handle this exception

If you are catching an exception to get the compiler to be quiet, instead use this idiom:

try {
   // something that throws a checked exception
} catch (Exception e) {
   // TODO handle this exception (but for now, at least know it happened)
   throw new RuntimeException(e);


    1. I've been a professional Java programmer for over three years. This is one of the best "exceptin handling" aritcles I have ever read. Good work! And so true.

    2. Good stuff for sure. I particularly like the tip to wrap a checked exception in a RuntimeException while you get around to handling it correctly.

    3. This holds my attention, from the definition of Contingency, above: "The caller of the method expects these kinds of conditions and has a strategy for coping with them."

      I think it overly presumptuous for me as a method coder to believe to know what is in the mind of the caller's coder. If I were writing a method that opens a file, how can I know the caller has a strategy for handling 'file not exist'? Or, whether or not the caller pre-tested for existence (thereby, eliminating expectation of this error). On the other hand, something obvious to me, say an io error when opening a file, is not so obvious to the caller.

      The only way I can be sure the caller has the expectation is if I, as library writer, DEMANDED it. So a definition of 'Contingency' should have stopped with: "An expected condition demanding an alternative response from a method that can be expressed in terms of the method's intended purpose.", and add, the caller must have an expectation of these errors and strategies for coping with them.

      Gosling, in the interview with Venner (linked above), mentions the faults that could occur in avionics due to unpredictable flight conditions. In these kinds of situations, because they are unexpected, these would not be Contingencies, right? However, in avionics, the motto should be 'Expect the unexpected', and so these kinds of faults would have to be Contingencies, right? But then, what strategy could exist for the 'GremlinsAttacKException'? So in the extreme case of avionics, these definitions that separates Fault from Contingency fails.

      I think the definitions fail also in the mundane world of database application development. Compared to avionic, the failure to manage an unexpected or expected error is not as dire, but still necessary. I still have to expect the unexpected.