I’ve been rolling my own Hadoop unit testing techniques for about a year now. Though I’ve had a hunch that the MRUnit framework (which I cannot help but pronounce “Mister Unit”) would ultimately be the way to go, I shied away from it because when I first took a look it didn’t support the new Hadoop API and just seemed generally not mature enough. I told myself that as soon as it became available for download as a Maven package I’d give it another look. Well it did, and I did, and I’m pleased enough to switch over. MRUnit is still in incubation, and documentation remains scarce, but it works with the new Hadoop API, and the basics appear to be in place.
Some caveats: the .pom snippet on the Wiki recommends version 0.8.1 but I think that’s in a non-standard repository because my default Maven 3 configuration couldn’t find it, so I went with 0.5.0. It’s also not clear to me what the
classifier option in the .pom snippet does, so I’ve omitted it with no apparent ill effects. As of this morning, the download mirrors show 0.8.0 as the stable version. Your mileage, as always, may vary.
I’ve created a Hadoop Word Count github project that contains a new-API version of the classic Hadoop example that builds in Maven and uses the MRUnit testing framework. I hope that it can serve as a template for people employing these two technologies in their Hadoop development work.