Unit Testing MapReduce With the Adapter Pattern

Testing MapReduce is tricky. Distributed code is inherently complex, the bundles of callback functions that comprise mappers and reducers can have subtle interactions, and to fully put them through their paces, you have to run them on a full-fledged cluster. Tom White’s Hadoop: The Definitive Guide has a helpful chapter on scaling up from unit tests to local single-JVM runs to full scale integration testing. An online article at Cloudera has a nice summary. That article also mentions the MRUnit unit testing package, which in theory is the right tool to use, but I haven’t had much luck with it. Last I checked, MRUnit does not support the new Hadoop API, and the documentation is skimpy. The best resource I’ve found is an online presentation whose author also finds the package a little perplexing. I’d eventually like to use it, but for the time being I’m leaving MRUnit on the shelf.

In the meantime I’ve taken to modularizing my MapReduce code so as to make it amenable to the JUnit framework. Sometimes this is as straightforward as factoring out your core algorithmic work into its own function. However, applications that employ more sophisticated techniques like secondary sorts or in-mapper combining may require passing state around between the various callbacks. Factoring your algorithm out for testing purposes is still a good idea, but the fundamental unit of abstraction is no longer a single function.

The solution I came up with is to make the fundamental unit a class, and use a Design Patterns Adapter pattern to integrate that class into the MapReduce framework. Your core algorithm goes in an abstract class whose interface mirrors that of a mapper or reducer, but is written in a way that allows it to be run outside the framework.

class MapperAlgorithm {
      // Maintain algorithmic state here.

     protected void map(key, value) {
          // Do algorithmic work here, calling the emit() function as needed.
     }

     protected abstract void emit(key, value);
}

Note that the emit function is abstract because this class doesn’t know anything about Context or other MapReduce structures. (Also for specificity’s sake I’m writing this example as a mapper, but it works the same mutatis mutandi for reducers.)

Then a mapper adapter class which handles the actual MapReduce data structures subclasses this algorithm.

class MapperAlgorithmAdapter extends MapperAlgorithm {
     private Context context;

     public MapperAlgorithmAdapter(context) {
          this.context = context;
     }

     public void map(key, value, context) {
          mapper.map(key, value);
     }

     protected void emit(key, value) {
          context.write(key, value);
     }
}

This MapperAlgorithmAdapter class is embedded inside the framework’s Mapper class. You can unit test MapperAlgorithm to your heart’s content by writing a concrete harness that overloads the emit function with something that accumulates output in a list. Of course any logic inside MapperAlgorithmAdapter doesn’t get unit tested, but this tends to be skimpy boilerplate code. For both testing and design modularity reasons, I’ve found this to be a good way to go.

Advertisements
This entry was posted in Those that have just broken the flower vase. Bookmark the permalink.

5 Responses to Unit Testing MapReduce With the Adapter Pattern

  1. Pingback: Unit Testing MapReduce With Overridden Write Methods | Corner Cases

  2. Sanjeev says:

    I have been using mrunit to test my mappers. A simple example is given here (more at https://github.com/smishra/scala-hadoop/blob/master/src/test/scala/us/evosys/hadoop/jobs/WordCountMapperTest.scala). I have fount it pretty straight forward to test the mappers as long as they are in there own classes.

    class WordCountMapperTest extends HImplicits {
    val logger: Logger = LoggerFactory.getLogger(classOf[WordCountMapperTest])
    @Test
    def map = {
    val mapper: Mapper[LongWritable, Text, Text, LongWritable] = new WordCountMapper

    val mapDriver = new MapDriver[LongWritable, Text, Text,LongWritable]()
    mapDriver.getConfiguration.setQuietMode(false)
    mapDriver.setMapper(mapper)

    mapDriver.withInput(1, “world Hello World”)
    mapDriver.withOutput(“world”, 1).withOutput(“hello”, 1).withOutput(“world”,1)
    val result = mapDriver.runTest()
    logger.debug(“result: {}”, result)
    }

    }

  3. Pingback: Hadoop Word Count Example with Maven and MRUnit | Corner Cases

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s