15 June 2016

Java Lists

I was again called into code review mode in encountering uses of java.util.Vector throughout a piece of code in method arguments, return values, and fields, and took it upon myself to instruct the junior developer on the error of his ways. These kinds of conversations are sometimes a little too one-sided, such as I say "don't do that" or "do it this way" and they say "okay" and don't do that anymore. This time was one of those other cases where they have an opinion and disagree want to learn more from my sagacity and we have a short discussion about the pros and cons of Vector in particular and more generally how to refer to and use types in OO languages.

As a result of the discussion, some particular points arose for me to capture in my infrequently used blog that I hope others also find of use.

There are many articles and blog posts that can be found comparing Java "List vs array vs ArrayList vs LinkedList vs Vector vs ...". Some are more recent and some are older, and some have benchmarks on particular operations. The results of the benchmarks have changed over time as improvements have been made in various areas of the Java Runtime, such as in garbage collection, JIT and HotSpot compilation, and reduced overhead of unnecessary synchronization, which are more relevant to this discussion than some other improvements. One of the biggest relevant changes was the unification of many similar "list" types under the Java Collections Framework, which allowed various list types to be used interchangeably in an improved OO way.

My intent is not to again benchmark which list implementation is the best, but to outline how to choose and how to structure your code so you can both make a choice and change that choice as it is appropriate to do so. Therefore, when trying to decide "which List type to use?", here are some guidelines.

Use an abstraction

My first tip: "always use an abstraction". One rule of good OOP is to use the highest level abstraction you can get away with. In this case, that means to generally use java.util.List type reference everywhere except where you must not, which is when instantiating. This then allows you to change the implementation without an effect to most of your code. Why would you want to change the implementation? See below.

List instead of Array

The second tip is: "prefer List over array". The fuller version is: "use an array only if you really really need it for performance and have run profiling to determine this is really really what you need". Otherwise, use a List, i.e.: "always" prefer List over array for safety, improved API, and interoperability.

Arrays are certainly useful, just like any other of the lowest-level building blocks of data structures. But an array is not a substitute for an abstract data type. An array should be wrapped in the semantics of a class, encapsulated, and had the invariants of that semantic protected. Transforming the array to do anything but the simplest of operations can be far more pain and overhead in future maintenance than the savings in performance of a raw array over a List in your APIs. You may end up with an entire library of static methods to manage arrays rather than simply combine state and behavior into a class and enable leveraging the good parts of OO.

Certainly there are times when use of arrays is a must. One example is in large-scale processing of low-level data, such as image manipulation. You don't want to deal with a List<Byte>, and the overhead of wrapper types would be unreasonable. However, that doesn't mean you should resort to only passing arrays and primitive types through your APIs to manage that image data.

Start with ArrayList

The tip is: "use java.util.ArrayList as your 'go to' list implementation". If it turns out you need a different one, it should be a simple case of changing the code that instantiates. ArrayList gives good performance for typical use, such as adding a bunch of elements, iterating, not much overhead.

Vector is Synchronized

Since the conversation started with Vector, the tip is that "Vector is rarely if ever faster than ArrayList". Vector is like ArrayList, but adds synchronization. This means it checks for mutual exclusion locks on every method call into your list (depending on the JVM optimizations, of course).

In versions of Java somewhere before 8, this was a significant performance hit, but more recently is much less so. The performance hit is more than zero, but if you use List throughout your code it doesn't matter. In particular if you only use the instance in the context of a single thread, there is no reason to pay for something not being used.

You can also achieve a similar thread-safe result by wrapping the ArrayList (or any List) with Collections.synchronizedList, giving not only another option in instantiation but in profiling.

LinkedList is essentially useless

The next tip is "java.util.LinkedList is rarely the best option for the situation". LinkedList is like ArrayList, but stores in a linked data structure instead of contiguous. In computer science theory, an add or remove in the middle of the linked structure is technically fewer operations at O(1) since the subsequent list elements don't have to be shifted and a get operation is slower at O(n) because it is not random access.

In practice, array manipulation at lower levels is crazy optimized and benchmarks have consisitently shown it is actually faster to shift contiguous memory than to perform the linked insert. Of course, indexes and other management overhead and lazy operations may be added to improve performance to the internal implementation of the LinkedList, but would only increase its memory footprint. In my experience then, there is little benefit in using java.util.LinkedList.

The correct answer is always "it depends"

The final and most important tip: "it depends". For anyone with a graduate degree, you have likely uttered and heard this innumerable times. Many times a problem scope (such as a benchmark) is too narrow to cover a practical case and must include other information to characterize the situation enough to make the best, informed decision. One of the best ways to get that situational awareness is to have benchmarks available but then profile your case and see where it aligns with the benchmarks.

Benchmarking "add" can be misleading between Vector and ArrayList because they grow at different rates. If all you ever do is "add", then likely Vector is better, but why are you only ever calling add? You either have profiled your code or it is not significant enough to profile and validate your expertise with science. Benchmarking any operation in isolation (concurrent access, random element access, head/tail insert, head/tail remove, inner element removal, iteration) can be misleading, which is why it is useful to be able to change the implementation when your code changes or you discover where performance gains will make the most impact.

I reminded the developer during the conversation that raw running time is not always the ultimate end. Many times perceived performance outweighs actual performance. Also, when dealing with user interfaces, as is common in our shop, seconds or hundreds of milliseconds are more our concern and then we move on to the next task in the never-shrinking queue.

So to best be resilient to change, you must first know the task you are solving and what choices will likely make it work best, then design to the proper level of abstraction for your case, then profile and change where it will make a difference.

08 May 2012

Want to get a job after college?

The question I aim to help answer is this: how do I get the skills I need to get an entry-level job that already requires the skills I need?

I have noticed recently from several of my in-town colleagues in the software business that there seems to be a shortage of qualified, interested people seeking open positions. The key word here being “qualified”, with “interested” not being far behind. There is no shortage of applicants, but it seems like well more than half are so unqualified that they are job-seeking for the sake of job-seeking, yet not actually looking for a place to work in return for a paycheck. I know in the minds of many people there is an apparent unsolvable problem where you cannot find a job because you do not have the skills, but cannot get those skills because you can only learn them while on that sort of job. My aim here is to explain what skills those are and how you might go about acquiring them without having true, full-time, paid job experience. Please note that these comments are my own opinion and not those of guidance counselors, hiring managers, or others that are far more in-the-know than I am.

Even before addressing skills, however, there is something to be said about the non-technical parts. Number one is to do a little research. Find out something about the organization you are joining. What sorts of projects do they do? Who are the managers and team leaders, and what are their areas of expertise or research? What is the working environment like? Do people wear jeans and shorts to work or neckties? Seeking an entry-level position, you are not going to get away with looking unkempt, wearing flip-flops and jeans, and still get that contested entry-level position.

Number two is to be enthusiastic. An employer is going to invest in you. They don’t want to hire someone to see them leave after a year or two. Organizations often take six months to a year to really bring someone up-to-speed to start working independently, and until then it’s a loss of productivity in another developer or two. If you don’t care about the work you are doing, you are less likely to want to do it well. Employers can see that when you talk to them, and are more likely to hire someone that wants to be there than someone that does not, despite slightly less experience. Not wanting to be where you are and not caring if you produce quality output is a quick way to kill motivation in a programming team. Prove to your future employer that you will be a benefit.

Now on to the technical skills. The main point to remember is that when starting out, familiarity can be much more helpful than expertise, and broad familiarity even more so than specific expertise. Especially for a new hire, you are likely not to have much exposure to the useful and important day-to-day tools of a professional development team, but your learning curve will be much easier if you expose yourself to some of what you may see. Every shop is a little different, but most have the same basic pieces in place. Get to know some etiquette and get familiar with the types of tools you will be using. My point here is that each organization has their own practices and tools, and you will need to learn how they do things. It is helpful to ask questions and contribute, but don’t expect to have their policies become your policies as a new employee. You will need to learn their way of working, and the faster you do the more productive you will be and the team that got stuck with you as a time sink will like you and respect you more and treat you as an asset sooner.

Your number one technical skill to build is using a development environment. Many Java developers use Eclipse, perhaps even most developers do. Some use it for everything, fully embracing the integrated nature of the environment, and some use external tools out of preference. Developers for other languages use other environments, such as Microsoft Visual Studio. Some use a system-default text editor and command-line. You don’t need to know about all IDEs, or even be an expert at one. What you must do is know what they are capable of and what benefits they provide. Learn how to navigate around code, outlines, and project explorers; searching all files for a function name by string is a thing of the past. Get used to using shortcut keys. Learn some of the paradigms of the popular IDEs, such as where to find menus and toolbars, whether most things are done via shortcut key or context menu, where to find help documentation, and where to tap in to the user community.

Number two on technical skills is a source code management system. Git is becoming increasingly popular in the open source world due to the workflows it can handle. Subversion is still quite popular. CVS is still used in many environments and being phased out in others. Learn the benefits of source code management and versioning. Learn how to check out, check in, properly annotate a commit message, branch and merge, deal with other developers on the same few files. Learn about patch files. Learn the pros and cons of the most popular systems, and look up why groups are migrating from one to another. Learn some of the ways these systems integrate with IDEs. Learn about the various workflows that are commonly used, such as tiered approaches or having a single central repository.

Number three on technical skills is issue tracking. Bugzilla has been around a while and most commercial and open source developers have at least heard of it. Trac is another popular option that also brings in connections to a repository, code browsing, and Wiki. Again, don’t try to learn them all, or even the names of all the ones out there. Find out what some popular ones are, look at the features they offer, look at how organizations use them, and get some practice. Some open source projects use Bugzilla for a discussion forum as well as defect investigations and enhancement requests. Others strictly use their forums for discussion and then open a ticket when they have good reason to believe it is a new issue. Learn some of the ways organizations use this kind of tool.

For number four I will group the rest of the tools that may be useful. These start to branch out wildly depending on the industry you are in, but the key here is to not be taken completely unaware that something exists. Some examples in no particular order: ANT, word processing, spreadsheets, database systems and SQL, XML, JSON, image editing (e.g. Gimp, Photoshop), email and instant messaging, Skype, javascript, python, differences among web browser engines. Also try to be familiar with some of the pros and cons of operating systems for different needs. Linux, Windows, Mac OS, various smartphone systems, tablets, and how they are tuned for servers, gaming, software development, portability, battery life, photo and video editing, customization, malware control, etc.

Now how do you go about learning all of this useful stuff? If you are at a college or university, find or start a student organization or club. I recommend a game developers organization, which brings together designers, programmers, artists, and musicians. If you are not at a college or want another path, jump on an open source project that interests you. Look on SourceForge, Google Code, github, or see if any software you use is open source and how you might be able to help. Get into a situation where you are working with other developers on a project. Practice using a source code repository and issue tracking system. Practice working with other people on a common task, collaborating and sometimes conflicting. Learn how to resolve conflict. Dive into a project being managed and learn something about project management, deadlines, bug assignment, and just teamwork in general using the tools of the trade to coordinate and communicate.

07 September 2011

Java Inline Class Definitions

Here's something in Java I came across the other day that I didn't know about: non-anonymous inline class definitions.

interface ListenerRegistration {
   void unregister();

List<Listener> listeners = ...;

ListenerRegistration addListener(final Listener ears)
   class ListenerRegImpl implements ListenerRegistration {
      public void unregister() {

      void register() {

   ListenerRegImpl reg = new ListenerRegImpl();
   return reg;

Just as with an anonymous inline class, this class can access final variables in outside scopes, etc.
But more, this class:
  • can be declared anywhere in the block it is used, not necessarily at the top, just as with class definitions declared inside other classes
  • retains its type information allowing access to added fields and methods

This is especially useful when you need to return an instance implementing an interface, but want to use a concrete class. If the class is not used anywhere outside the scope of instantiation, the class can be defined inline, and added methods and fields are now known instead of having to use only a reference to the implemented interface.

Obviously, this example is to demonstrate a point and not the simplest way to implement the above 'addListener'. It's also not "always a good thing" as with most idioms and design patterns, use a good an appropriate tool for the job you have to accomplish.

Just another affirmation that "you don't know a language unless you program in it for 10 years".

12 January 2011

Mostly-Declarative Eclipse Perspective Folders

I came across something not long ago that I did not realize could be done with perspective layouts. In the documentation for a org.eclipse.ui.perspectiveExtensions view, the relative field has this:
the unique identifier of a view which already exists in the perspective. This will be used as a reference point for placement of the view. The relationship between these two views is defined by relationship. Ignored if relationship is "fast".
Previously, I had thought that meant strictly that only a "view id" (including the ID for the editor area) could be used here - a thought that is reinforced by the tooling which lets you select any declared view ID in scope, including the org.eclipse.ui.editorss ID.

The issue in my application was that I needed a folder in the perspective - which could only be declared by the perspective factory - to be populated with view instances (same ID, various secondary IDs) from a down-stream bundle. The perspective factory can't have a dependency on the contributing bundle, and copying the view ID string literal into the up-stream bundle was not a tolerable solution.

What I wanted was to be able to declare a page layout folder as a perspective extension, or otherwise solely in the plugin.xml files for the bundles but was unable to find a way to do this.

However, as a solution, I discovered that a new folder can be created in the perspective factory code and then the down-stream perspective extensions can reference the folder's ID as a view ID. Ah-ha!

Now the solution:

public class PerspectiveFactoryWithFolder implements IPerspectiveFactory
   /** A layout folder for the right side of the perspective */
   public static final String RIGHT_FOLDER = "net.bilnoski.module.ui.perspective.folder.right";
   public void createInitialLayout(IPageLayout layout)
      layout.createPlaceholderFolder(RIGHT_FOLDER, IPageLayout.RIGHT, 0.7f, layout.getEditorArea());

Only the folders must be declared in code, and the rest I was still able to statically in plugin.xml declarations. An acceptable compromise, especially since I needed the perspective factories to exist anyway to hide the editor area.

22 December 2010

Notes for Starting Out with Eclipse GEF

I finally dove into Eclipse GEF. At first, I dug around a bit to find the documentation, which mostly points to examples, and found them somewhat helpful but required a bit of massaging to handle the cases I needed for my viewer. I already had a model and other view parts and workbench elements that would manipulate the model, and just wanted to visualize it with a GEF viewer and allow some manipulation.

I didn't need an editor part, so started from scratch adding a graphical viewer to a view part, figuring out what the built-in editor part subclass did along the way. I did't need the palette, the ability to create visually within the viewer, or the ability for a user to have a free-form XY layout - they would simply drag items around and the layout would auto-update. The model would be used as-is, and would store no visual information - including "position" or "bounds", because the layout would handle that.

So in looking through the examples, I started to realize what was there, exactly what the framework did in particular cases, and came up with a set of notes and self-proclaimed "best practices" for myself and my development team to use and contribute to. Without any more "ado", here are some helpful links and lessons learned that would be helpful to first-time users of GEF.

If you are just starting into the world of GEF, these are must-read
  • The Draw2D Developers Guide - on help.eclipse.org/helios
    • GEF is built on top of Draw2D, so to understand figures, layout, and painting you will need to know this. It is a very high-level and short document on how Draw2D works, but at least lets you start to know what you do not know.
  • The GEF developers guide - can be found on help.eclipse.org/helios.
    • This is a similarly high-level and short document describing how GEF works in general.
  • A very good baby-step tutorial for starting from scratch from EclipseCon 2008 (by Koen Aers) http://www.eclipsecon.org/2008/?page=sub/&id=102
    • Covers from-scratch creation, including model, viewer, edit part factory, move/resize, palette, create/delete, undo, create/delete connections
Here are some not as useful links, but included for completeness to help in your search for other information
Notes and Lessons Learned

  • Commands
    • They are constructed often without being invoked, so make them inexpensive to construct and to test ability to execute
    • They should encapsulate two states:
      • Store the state necessary to execute, e.g. ability to look up parent/child to be manipulated
      • Store the state modified on execute if the command canUndo(), i.e. only save child bounds within the execute() method if they have not been saved (occurs if execute-undo-redo) and do not save modified state upon command construction/initialization, since most of the time this will be thrown away.
    • Do not hold onto figures, editparts, or other GEF elements. For example, a delete command will have its edit part removed. If the command is un-done, a new editpart should be created and re-populated with state stored by the command.
    • Commands are closely related to Edit Policies, so it is helpful to understand those in depth to complete the picture on commands.
  • GraphicalViewer
    • It can be created in a view part just as well as using the GEF built-in viewer wrapped in an edit part, which adds a lot of boilerplate and may not be necessary for all uses.
  • Model Elements
    • When determining whether an EditPart exists or must be created for a model element's model children, identity equal (==) is used on the model element. Therefore, when returning results in EditPart#getModelChildren(), ensure the same model instances are returned every time for the same model children.
  • EditPartFactory
    • It need not be state-less, so the factory itself can store/maintain the visual information required of EditPart instances that cannot be stored in the data model elements, such as a cached icon, color, or position/bounds.
  • Figures
    • Updates
      • After updating an edit part due to a model change that must be reflected in the figure, make sure to call IFigure#revalidate() and IFigure#repaint()
      • The GEF UpdateManager handles painting and layout events, and processes the pending operations in batches for efficiency. Notify the UpdateManager that a layout has changed by marking a figure invalid via IFigure#revalidate() or notify that it needs to be re-drawn via IFigure#repaint().
    • When creating a figure, just set the things that are "constant". Use EditPart#refreshVisuals() to set render properties that vary depending on the model element or figure state, such as label text and background color.
  • Layouts
    • A layout "constraint" is analogous to "layout data" in SWT. Constraints are the specific data used to lay out a single entity in its parent.
    • Just as with SWT layouts, if things are pretty simple it can all be done in one place (i.e. one figure class or edit part class). Once it starts getting more complicated and decentralized, follow these principles to keep layout code properly decoupled
      • An element is in charge of only setting its own size and setting child element locations.
      • A child element should not try to set its own location because it should have no knowledge of its parent's layout type or constraints. Computing constraints on a child element should be done through a policy the same way the layout was set on the parent.
      • Note that "bounds constraints" contain two parts: size and location. This makes it difficult to distribute the work since they are usually combined into a single Rectangle
  • Edit Policies
    • An edit policy determines what a user can do to a figure (via an edit part) by responding to each user Request with a Command
    • An edit policy installed in an edit part is really a delegate for some responsibilities of the part.
      • It is possible for an edit part to not install any edit policies, and instead override and reimplement directly the methods used to generate commands. This is not recommended, but is noted because commands are requested from an edit part which delegates work to its edit policies.
      • Not all requests will go to edit policies, and some must be handled within the edit part or manually passed to its policies. See the notes regarding mouse click handling.
    • There are a few built-in edit policy hierarchies, each of which is specialized to handle a set of certain related request types, such as layout, container, component, feedback, and connections.
      • It is possible for an edit part to install a single edit policy to cover all the cases, but the specialized pre-built policies are intended to cover most cases, including nuances of special cases, and can be reused in pieces instead of having a new policy for each new edit part type for a different composition of cases.
      • Some edit policies are not installed by the host edit part. For example, the LayoutEditPolicy decorates a host's child edit parts added to the layout with "satellite" edit policies which augment the edit part added with additional functionality to perform during a drag, such as feedback.
    • In general, the built-in edit policy subclasses divide command creation into two parts; the first part to "get" a command, and the second part to "create" a command. The "get" version of a method is the entry point, and typically iterates over the request input calling the associated "create" for each one.
    • GEF treats all edit policies uniformly through getCommand(Request), so the hierarchies are only for implementation convenience and do not have special hooks back in to GEF's command evaluation service.
    • Edit policies are installed by multiple locations, including EditPart#createEditPolicies() and LayoutEditPolicy#decorateChild(EditPart).
    • When installed, GEF iterates over policies in the same order they were added (but I would not count on this always being the case)
    • Edit policies are considered using three different paradigms by their host edit part, depending on the type of request
      • Pool of responsibility: Each policy is considered for a result and the results are collected. If the collective result is valid, then a further action may be performed (by the caller). One "nay vote" will halt progress. See also the notes on Commands for why this is important. This pattern is only used by getCommand(Request)
      • Chain of responsibility: Each policy is considered in sequence and the first one to respond with a necessary value trumps the remaining policies and its value is used. This pattern is used by delegate methods of AbstractEditPolicy via AbstractEditPart, such as getTargetEditPart(Request)and understandsRequest(Request)
      • Broadcast: Each policy is notified in sequence of an event and no result is processed. This pattern is used to show/hide source/target feedback, and to activate/deactivate edit policies
      • It is important to know this detail regarding notification paradigms because the built-in edit policies override certain behavior differently to effect their changes. For example, simply moving the handling for an "add child" command from a LayoutEditPolicy to a ContainerEditPolicy will not work. The container policy does not have an implementation for getTargetEditPart(Request) which is used to determine the new parent for the drag operation.
    • There are three valid results for a command request from an edit policy: a valid command, null, or UnexecutableCommand.INSTANCE (which is also a valid command, but a concrete case).
      • A return of null means that the policy has no interest in the request. If all policies return null, the request is denied
      • A return value of UnexecutableCommand.INSTANCE (or any valid command with canExecute() of false) will veto any other request results from the pool. This is important to note so you do not implement policies that have no interest in a request by returning an unexecutable command as that will always cause the request to be disabled.
      • null or unexecutable command pool will result in a "no-can-do" cursor.
    • If no feedback at all is provided, then there is no edit policy handling the request to return null or an unexecutable command. This is an important corrolary to the notes for what to return for a request.
  • Clicking and mouse events
    • Mouse event dispatching is generally handled in one of two places: a figure or tool
    • This is the sequence for handling a mouse event
      • First, the mouse event is dispatched in the DomainEventDispatcher to the Draw2D SWTEventDispatcher, which routes to the figure
      • If the figure handles the event, it will call InputEvent#consume() to invalidate it from further processing.
        • This is done, for example, in ClickableEventHandler#mousePressed(MouseEvent)
      • If Draw2D does not consume the event, it is sent to the active tool via the EditDomain
      • The tool will attempt to create a DragTracker (for example, DragEditPartsTracker which "is-a" SelectEditPartTracker) to handle this and subsequent mouse events until any eventual drag operation is finished
        • If the tracker consumes an event, it returns true from the handle* handler method
      • The tracker is notified about the event, which is managed internally (e.g. DragEditPartsTracker#handleDragInProgress()) or causes a perform* method to be called, such as DragEditPartsTracker#performDrag() or SelectEditPartTracker#performOpen()
        • handleDragInProgress() builds a new Command from calling EditPart#getCommand(Request) on each EditPart of interest, and invokes the command if the drag completes normally
        • performOpen() passes a request to  EditPart#performRequest(Request). Note that it does not pass to EditPart#getCommand(Request), which would route to an edit policy.
    • It is important to realize that when using EditPart.performRequest(Request), it is up to the implementor to use a Command and update the CommandStack (obtainable from the EditDomain). If the Request is routed through the edit policy, the command stack is managed automatically.
      • With EditPart: override AbstractEditPart#performRequest(Request) to be notified of IRequestConstants.REQ_OPEN requests, which are fired on double-click
      • With IFigure: add various listener types directly to the figure (focus, key, layout, mouse, etc)
        • Do not override the figure's handle* method, just use its internal listener notification mechanisms.
        • They should probably be marked final to reduce temptation, but overrides may be necessary in certain cases.

08 December 2010

Zest Type Hierarchy View

Sometimes the built-in JDT "Type Hierarchy" view is just not sufficient. Most of the time it gets the questions answered that I am looking for, but certain hierarchies do not display well and some things are not shown that I would like to see. For example, "What interfaces does my type/hierarchy implement?" Also, a type may appear multiple times in the tree viewer due to its hierarchy of interfaces, and it is not immediately clear where it "belongs".

So I used Zest and reworked the demo PDE bundle hierarchy visualization view to be a type hierarchy visualization view.

In particular, the project I'm working on now has a split hierarchy of interfaces with a base class that implements none of them but subclasses that implement one or more. Here's a screen shot of the hierarchy in both views.
Type Hierarchy View + Graph Type Hierarchy View

This view gives me not only the ancestor and descendant types for the type focused, but also all ancestor types and implemented interfaces for anything displayed. To get the same information about interfaces implemented using the Type Hierarchy view, I would need to focus on multiple types and look at both the normal and inverted hierarchies.

I've also been doing some work learning GEF  lately, so here are some more hierarchies visualized. I had the shape and logic examples in my workspace at the time, so the extensions from those also show up.
org.eclipse.draw2d.Border hierarchy

org.eclipse.gef.NodeEditPart hierarchy
Notice in the hierarchy for NodeEditPart that PropertyChangeListener and LayoutConstants also show up, which would be a difficult thing to find in the Type Hierarchy view.

I find the built-in Type Hierarchy view sufficient for most purposes, but sometimes it just falls short. The graph view fills in the gaps for me.

As for hacking on it, I can post the code if anyone is interested in it for use or further development/refinement, which it certainly could use. It does not tie in to the IDE, so you currently have to manually focus on an element and cannot open the editor from a type displayed. I planned to also have the sash with the list of fields and methods for the selected type within the view, but that is not there yet. It doesn't remember any history, either. It still has a few bugs I'm sure, too, but it's in a working-enough state to be helpful.

06 December 2010

On Exception Management

This is a gathering of thoughts on exception philosophy and general management, including handling, propagation, and throwing.

The biggest problem with the management of checked and unchecked exceptions is the programmer. Libraries are written by different people with different preferences and idioms, and our own "perfect" code is blighted by poorly written libraries we are forced to use for one reason or another. Get over it, and make your code as correct as you are able. Handle cases in poorly written dependencies as well as you are able. 

Above all, the job of a software architect/developer/designer is to think. So don't be mindless about your work, be it a script to do a one-time job or something that has lives depending on it. Programming well with checked exceptions requires some forethought and restrictions, just as with Object#notify and Object#wait, and any other modules you use.

Towards helping programmers think about how they use exceptions, in raising and handling them, here is a post regarding the topic which presents a philosophy for exception management that can be used to allow a team to reach a consensus on the usefulness and eventual need for checked exceptions. Keep in mind that checked exceptions are not for all cases - they have their own place - but they should be used to solve the need for which they as a tool were designed.

The core of this discussion revolves around the notion of two categories of exceptions: Faults and Contingencies

try/catch and Exception Handling
The point of throwing and catching exceptions is to separate the error handling code from the main business logic (See "The One True Path" section). Sometimes these exceptions are handled one-off from where they are thrown, and sometimes they propagate farther to be handled. However, it should not be assumed that all exceptions are handled in central locations. The real answer for exception handling is "it depends".

It is certainly true that the closer in scope to a raised exception the code is the more context information is available. It is not likely that all context is known by the lowest level (i.e. the block throwing the exception), but it is likely that the lowest level knows of some context that will be lost if the exception is simply propagated. There is also a chance that the lowest level can adapt and choose a secondary course of action before notifying the caller of failure, so always simply propagating is not the answer.

Furthermore, it is not the case that exception handling must be done "far" from the source. The real intent of the try/catch is to separate exception handling from business logic instead of littering business logic with error handling. This is the same principle that advocates declaring variables closer to where they are used instead of only at the start of blocks. It is a convenience that exceptions can be handled at another place in the call stack and not a mandate.

Declare and throw a checked exception if you intend the caller to either recover or propagate by meaningfully reclassify the error with increased context to its caller.

Fault vs Contingency
First, some definitions from another article upon which I am building.

  An expected condition demanding an alternative response from a method that can be expressed in terms of the method's intended purpose. The caller of the method expects these kinds of conditions and has a strategy for coping with them. Maps to a checked exception

  An unplanned condition that prevents a method from achieving its intended purpose that cannot be described without reference to the method's internal implementation. Maps to a runtime exception

(From Effective Java Exceptions, by Barry Ruzek, 10 Jan 2007)

Another related definition,
Fault Barrier
  A try/catch block at a strategic point in a call hierarchy with a single catch clause for a root exception type which deals with the exception in a uniform way, such as opening an error message dialog or logging a message for a developer or system maintainer.

Now some implications of these definitions, or clarifications of uses of checked and unchecked exceptions

  • Faults exist - deal with them instead of ignoring them
    • This is the core reason why checked exceptions exist - so a programmer must deal with or explicitly ignore known problems (identified by the library designer)
  • Faults are unrecoverable, but only to the point of the activity encountering the fault, which is where the fault barrier should be placed. 
  • Faults contain diagnostic information to help post-mortem analysis and describe what happened to help someone (i.e. a developer or system maintainer) figure out why and fix it. Faults do not contain state information to help with recovery (that would be a contingency) 
  • Faults occur as implementation details and are typically abstracted within class methods (e.g. an "Account" object's user does not know it is making database calls or file I/O). Therefore, a checked exception thrown by implementation-specific libraries may (should?) be re-thrown as a fault (runtime exception) if it is unrecoverable by the caller according to the contract of the current method. 
  • Installing fault barriers improves clarity and maintainability, and helps prevent littering code with 1-off fault handling; obviously handling of contingencies should be done 1-off or propagated since it is a checked and "known issue" 
  • You must account for exceptions to be thrown as part of a resource acquisition-release cycle, so ensure all resources are properly guarded with try/finally blocks.
System vs User Interface
C# has no checked exceptions, but Java does. Why? It is a question not only of language philosophy, but the intent of the language itself. Java is intended to be a general purpose systems language, and C# is (realistically) a VB replacement mainly used to write event-driven UI applications. It boils down to this:

  • Typically in systems development, more of the code is based on "deterministic" functionality.
    • Almost all of the intent of the developer is in the code and so they know what to do when a problem occurs whether it is a fault or a contingency and they have put it into the code.
    • Therefore, they are more able to identify and handle specific contingencies because the intent and contingency are both explicit in the code. 
  • In event-driven UI development, more of the code is based on user interaction ("non-deterministic").
    • The developer must infer intent during development, well before the user supplies it, and must handle problems "before" they occur. 
    • A (probably) higher percentage of problems become faults and propagate to a fault barrier which alerts the user - the only one who can really decide on a contingency plan. 
    • Some problems can be handled without propagating to a fault barrier, such as validating input, but typically are still handled in such a way that they notify the user to determine the contingency.
Keeping these points of philosophy in mind, it makes sense that in UI code there are more cases of try/finally with a fault barrier to allow the call stack to clean up resources and notify the user while in system and library code there are more declared checked exceptions and localized handling of contingencies.

Caller/Callee Contract
Everything with exception management revolves around the contract between the caller and the callee, which boils down to the callee's ability to execute a single method successfully and as intended with the given arguments. This paradigm scales to one (general purpose) system calling another (utility/library) system.
  • The interface between library-quality code and calling code should mainly use checked exceptions 
  • A function's return type should not be used to return an error code, such as null or a negative value when only positive values apply (String#indexOf, for an example of what not to do) 
  • Each module should define a single base checked exception type and extend all others from it to simplify declaration of checked exceptions and handling.
    • Allows specific cases defined by exception sub-types to be handled in special cases 
    • Allows the base type to be used as a definite catch-all for the entire library  (excluding RuntimeExceptions that may be thrown) 
    • Avoids the mess of needing to catch java.lang.Exception but exclude RuntimeException 
    • Only a single checked exception need be declared (the subtypes could be declared as well), but the javadoc may reflect the subtypes used 
    • The method's interface need not change for newly encountered special cases in future versions 
    • Only a single catch block is required for all checked exceptions if there are no specific contingencies to handle, e.g. a fault barrier around calling the module.
    • This pattern counters one argument against checked exceptions that the number of exceptions will explode on a method interface as it propagates implementation-specific exceptions to the caller. That is poor abstraction regarding the library's design.
  • Modules should wrap implementation details and implementation-specific checked exceptions with the module's checked exception
    • A "save data" method on a persistence module should not throw SQLException, but instead should throw a module-defined PersistenceException wrapping the implementation detail. The user of the persistence module need not know it was backed by SQL, but does need to know that exceptions may occur whatever implementation is used.
    • The stack trace (which may contain nested exceptions) containing useful context in the detail message aids in problem forensics, and particular subclasses of the module exception type can be used for cases expected to have contingencies.

  • Unchecked exceptions should be thrown for illegal state, such as an iterator's next() which throws if it is in an illegal state. Callers should first test isReady(), and then not need to even define a try/catch block. If it is indeed in an illegal state, it is a fault and should be trapped by the fault barrier.
  • As a corollary to this, one should not depend on exceptions to define expected behavior - as a library designer or user. Good libraries will be designed to allow state to be tested before making an invocation that may result in an illegal state exception, and excellent libraries will prevent computational penalties and thread safety issues for a check-then-act. Whether or not the exception is checked is a different matter because it is probably invoked from a block where there are several chances for different exceptions from the same module that may be handled together.
What to Throw
  • Checked exceptions thrown should encapsulate necessary state (not just a string message) to help calling code solve the problem, reconstruct state, or reapproach the problem
    • They are intended to define contingencies, so should offer assistance to that end to the callee 
    • They are specific types, so may implement useful methods, encapsulate complicated objects, or perform specific behavior. 
  • Unchecked exceptions thrown should encapsulate necessary state in a developer-friendly and developer-useful message, e.g. error codes, to help developers or system maintainers detect and correct problems after-the-fact
    • They are intended to encapsulate (aggregate) state information both at the point of failure and at every level from the point of failure to the fault barrier that could be of use to the developers or system maintainers. 
    • As a developer writing exception propagation code, a general rule is to add or wrap into the message all the (useful and pertinent) information that could be gained by having a breakpoint and looking through the variable values in the system execution stack. 
Use of Java RuntimeException Subtypes
Basically, use a checked exception for everything except when a problem will probably propagate all the way back to a runtime exception trap (fault barrier).
  • Rule #1 - runtime exceptions are only for faults
    • Programmer errors: check for null arguments, illegal or invalid arguments, illegal state, unsupported operations 
    • Unrecoverable errors: database is dead and not coming back, file does not exist and won't anytime soon
      • Some of these look like they are unrecoverable, but can be solved by waiting 
    • In general, you need to know what situation you are in 
  • Rule #2 - don't be a lazy programmer
    • Stop trying to avoid "work". Handling exceptions properly, or at least more than catching them and writing a comment, is work. It takes effort. It's part of why humans write code instead of monkeys or code generators.
    • If you feel lazy, at least wrap checked exceptions in a RuntimeException - it will keep you from having to manage it right now in your thought process, but at least it will propagate and get accounted for if it occurs during testing. 
  • Prefer IllegalArgumentException over NullPointerException - let the runtime throw null pointer
    • See also the many subclasses of java's runtime exception hierarchy
    • Exception type can help reduce the time needed to diagnose real problems
    • There is really no (or very little) need for application-defined subtypes of runtime exceptions because:
      • A runtime exception indicates an unrecoverable fault caught by a fault barrier, which only needs to catch the base RuntimeException and logs the message 
      • Runtime exceptions support exception chaining so can wrap any message or checked exception.
  • Document runtime exceptions in throws clauses of public library methods 
The One True Path
The One True Path is the execution sequence of code that produces no errors and achieves the expected correct result. Deviations from this path include exception handling blocks.

One problem with Java's compile-time checking of exception handling is that programmers tend to:
  1. write their code 
  2. notice a checked exception that must be handled 
  3. add a try/catch with a TODO or empty catch block and intend to handle it later 
  4. never handle it later 
One solution to this is to have the autogenerated code not add try/catch TODO, but try/catch and wrap the exception in a RuntimeException. That way it can still propagate to a fault barrier if it occurs, and the programmer can continue their thinking without being interrupted by the annoyance of writing handling code when their mind is on the "true path" code.
Here is what you should not see in code:

try {
   // something that throws a checked exception
} catch (Exception e) {
   // TODO handle this exception

If you are catching an exception to get the compiler to be quiet, instead use this idiom:

try {
   // something that throws a checked exception
} catch (Exception e) {
   // TODO handle this exception (but for now, at least know it happened)
   throw new RuntimeException(e);