Unit Testing MapReduce With Overridden Write Methods

Recently I proposed an adapter pattern for use in writing MapReduce code that allowed the core algorithms to be unit tested without the need for a scaffolding framework like MRUnit. I’ve used this pattern in a large coding project, and though I am pleased with the functionality factoring, as a whole the design is too clunky. You end up having to write three classes: the algorithm, the adapter, and the actual mapper or reducer that encapsulates the adapter. This leads to some complicated class hierarchies and boilerplate code. Particularly given Java’s runtime type erasure and lack of mixins, it’s easy to paint yourself into a corner.

A better approach is to have only a single mapper or reducer class, but hide the MapReduce context inside methods that unit tests can override. For example, here is a mapper that allows the write method to be overridden.

class MyMapper extends Mapper<...> {
     protected void map(key, value, context) {
          // ...do work and write out results...
          write(key, value, context)

     protected void write(key, value, context) {
          context.write(key, value);

A unit test would subclass MyMapper and override the write method with something that ignores the context parameter. (The idea being that if you wanted to override the write method of context directly you’d have to mock up a context object, which would be trickier.) Similar tricks can be done for other MapReduce operations that require a context like reading configuration parameters or updating counters.

The test subclass of MyMapper would keep track of all the write calls, perhaps like so:

class MyMapperTestHarness extends MyMapper {
     public List<Pair<KEY, VALUE>> keyValuePairs = new ArrayList<Pair<KEY, VALUE>>;

     protected void write(key, value, context) {
          keyValuePairs.add(new Pair(key, value));

Unit tests would then verify that the keyValuePairs list contained the expected contents in the expected order. (Here, by the way, is where you really wish Java had mixins because this test harness logic is going to be common to many different mapper classes, but it’s not easy to factor out because MyMapper has to be the base class.)

I have a version of the word count example on Github that illustrates this technique.

This entry was posted in Those that have just broken the flower vase. Bookmark the permalink.

One Response to Unit Testing MapReduce With Overridden Write Methods

  1. Pingback: Hadoop Word Count Example with Maven and MRUnit | Corner Cases

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s