Hadoop Word Count Example with Maven and MRUnit

I’ve been rolling my own Hadoop unit testing techniques for about a year now. Though I’ve had a hunch that the MRUnit framework (which I cannot help but pronounce “Mister Unit”) would ultimately be the way to go, I shied away from it because when I first took a look it didn’t support the new Hadoop API and just seemed generally not mature enough. I told myself that as soon as it became available for download as a Maven package I’d give it another look. Well it did, and I did, and I’m pleased enough to switch over. MRUnit is still in incubation, and documentation remains scarce, but it works with the new Hadoop API, and the basics appear to be in place.

Some caveats: the .pom snippet on the Wiki recommends version 0.8.1 but I think that’s in a non-standard repository because my default Maven 3 configuration couldn’t find it, so I went with 0.5.0. It’s also not clear to me what the classifier option in the .pom snippet does, so I’ve omitted it with no apparent ill effects. As of this morning, the download mirrors show 0.8.0 as the stable version. Your mileage, as always, may vary.

I’ve created a Hadoop Word Count github project that contains a new-API version of the classic Hadoop example that builds in Maven and uses the MRUnit testing framework. I hope that it can serve as a template for people employing these two technologies in their Hadoop development work.

Advertisements
This entry was posted in Those that have just broken the flower vase. Bookmark the permalink.

One Response to Hadoop Word Count Example with Maven and MRUnit

  1. Pingback: Seeing the Bars of the Hadoop Cage | Corner Cases

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s