Java Should Have a Pair Class Considered Again

Heterogeneous Pairs in Java

The debate over whether Java should have a pair class is a perennial one. Not quite a religious war, but heated, and unlikely to be resolved to everyone’s satisfaction. I’m in the pro-pair camp, but want to give the anti-pair arguments their due, because they illuminate interesting aspects of programming style and the design tradeoffs that are inherent in a programming language.

An ordered pair <x,y> is a core mathematical concept has plenty of uses in programming. If x and y are of the same type T, it can be expressed in Java as an array T[]. There is, however, no way to represent ordered pairs of differently-typed objects built into the syntax. Here as everywhere else in Java the answer is to create a class. So you’d represent a <String,Integer> pair like so.

class Pair {
    String s;
    Integer i;
}

But what if at one place in your program you have a <String,Integer> pair and somewhere else you have a <Long,Float> pair? Writing separate StringIntegerPair and LongFloatPair classes seems needlessly verbose when you could just say the following.

class Pair<X,Y> {
    X x;
    Y y;
}

At first glance this appears orthodox. Avoiding boilerplate is exactly what generics are for. Of course the issue doesn’t arise in loosely-typed languages like Python or Ruby, where the list primitives are inherently heterogeneous. It also doesn’t come up in Java’s cousin Scala, which is strongly-typed but has a compiler that can figure out what you want. The C++ Standard Template Library has a heterogeneous pair class, and in Java plenty of people end up rolling their own pairs, but when the call goes out to put a standard Pair into the core library it meets with a lot of resistance. While a Pair class may seem like a good idea, the argument goes, it would end up doing more harm than good.

The anti-pair arguments break down into two broad groups, which I call verbosity and semantics.

Verbosity

The verbosity argument says, sure, Pair<String,Integer> looks innocuous, but it won’t stop there. Say you have an inventory management application in which customers are represented by a company name paired with a number of employees and sales orders are represented by a salesperson’s name and a real number dollar amount. You could easily find yourself writing code that looks like this.

Map<Pair<String, Float>, List<Pair<String, Float>>> = new HashMap<Pair, List<Pair<String, Float>>>();

This is an unreadable snarl of angle brackets, and will only get worse as the application grows and requires ever more rococo levels of nesting. It’s better to express the same concept like so.

Map<Customer, List<SalesOrder>> = new HashMap<Customer, List<SalesOrder>>();

This is easier to read because, apart from primitives like List and Map, the types are all in the application domain. If Java has to engage in a bit of bondage and discipline to enforce this perspicuous style, so be it. Providing structure is what a good programing language does.

There are a couple of problems with this. First, the same illegibility can arise with the core Java classes. Consider the following line:

Map<List<Integer>, List<Map<String, Float>>> = new HashMap<List<Integer>, List<Map<String, Float>>>();

The classes here all come from the java.lang and java.util packages. There’s not a Pair in sight, but the line is no less unreadable. However, there’s a simple solution for this case: define an in-domain class that does nothing but extend List<Map<String,Float>> or what have you, essentially the equivalent of a C++ typedef.

The second and more important objection is that problem-domain classes introduce their own boilerplate. The cure may be worse than the disease. Because in a typical Java class you’ll likely want getter and setter methods, and an equivalence method, and a corresponding hash method. Soon you’ll have something that looks like this.

public class SalesOrder {
    protected String salesPerson;
    protected float amount;

    public String getSalesPerson() {
        return salesPerson;
    }

    public void setSalesPerson(String salesPerson) {
        this.salesPerson = salesPerson;
    }

    public float getAmount() {
        return amount;
    }

    public void setAmount(float amount) {
        this.amount = amount;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        SalesOrder that = (SalesOrder) o;

        if (Float.compare(that.amount, amount) != 0) return false;

        if (salesPerson != null ? !salesPerson.equals(that.salesPerson) :
                                  that.salesPerson != null) return false;
        return true;
    }

    @Override
    public int hashCode() {
        int result = salesPerson != null ? salesPerson.hashCode() : 0;
        result = 31 * result + (amount != +0.0f ? Float.floatToIntBits(amount) : 0);
        return result;
    }
}

Then you’ll have to go write the same thing for Customer, except with a few variable names and types changed. You just added two source code files that are each more than a page long, difficult to understand at first glance, and essentially vacuous.

By design, Java is verbose and prone to boilerplate. A Pair class does not make it significantly more so. If you want to avoid an angle bracket bramble, you’re better off doing rigorous type inference the way Scala does. Declaring particular classes off limits is an ad hoc solution when it is a solution at all.

Semantics

The more compelling argument for having the language encourage the writing of problem-domain classes is that this makes sense semantically. Fine, the argument goes, you’ll have a few boilerplate classes somewhere in your codebase. It’s worth a bit of verbosity hidden away in a dark corner if your business logic is expressed in terms of in-domain objects like Customer and SalesOrder instead of overly generic mathematical concepts like Pair. To borrow Model-View-Controller language, a bit of boilerplate in the Model and View is a reasonable price to pay for having your Controller algorithms expressed at the correct level of abstraction.

The first objection to this (as has already been pointed out on Stack Overflow) is that Java itself does not abide by this rule. The core packages contain classes like List<E>, Map<K,V>,  and so on whose semantics exists entirely in the mathematics/computer science domain. This is not a fatal objection, however. It is entirely reasonable to declare that the core of a programming language be semantically vacuous by design. It provides the mathematical building blocks which programmers assemble to solve a particular task at hand.

The problem is that the boundary between building block and task is sometimes difficult to draw. There are times when the mathematical concept of a pair is part of the application domain: putting two different things together is what a program does. The MapReduce distributed computing framework, for example, is built around the concept of passing around <key,value> pairs. A pair is the correct level of abstraction, and projects that work with it (e.g. the MRUnit testing framework) roll their own.

Also, some methods have more than one return value. Often you’ll write something that returns a integer and a list, or a float and a string, or what have you. Picking one to be a return value and the other to be pass-by-reference would be arbitrary, but building a union class for these classes would be confusing, implying a semantic connection that isn’t there. Passing a semantically-vacuous pair is what the program does at this point, and it’s bad for the language to force it to pretend otherwise.

Conclusion

When you design, you have a choice of whether to work bottom-up–adapting to your users’ wants–or top-down–providing a structure your users didn’t know they needed until you gave it to them. Focus group versus artistic vision. Open source versus Apple. A good design takes elements from both and strikes its own balance.

It’s to Java’s credit that there’s a Java way to do things. Obliging you to think about the boundary between language concepts and problem-domain concepts is a good thing. But as you push in this direction, you have to be sensitive to the level of push-back. If your language has a tenet like “no pairs” you want that to be uncontroversial among people who are committed to doing it your way. If enough of your users go ahead and write pair classes anyway, telling them why they shouldn’t eventually becomes beside the point. When something seems like a good idea it should probably be a good idea.

Advertisements
This entry was posted in Those that have just broken the flower vase. Bookmark the permalink.

2 Responses to Java Should Have a Pair Class Considered Again

  1. W.P. McNeill says:

    The first example people bring up in this discussion is a point in a Cartesian plane. While this is a good example of a basic object, it is irrelevant to this discussion, because the two coordinates of a point are necessarily of the same type.

  2. Pingback: Leave the Monkey at Home | Corner Cases

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s