MapReduce Implemented with “map” and “reduce”

The MapReduce paradigm was inspired by functional programming techniques, so why not take things full circle and rewrite a MapReduce job in a functional language? Here is a one-line Scala implementation of the classic “word-count” program.

List("to be or", "not to", "be").par.flatMap(_.split("""\s+""")).
    foldLeft(Map[String, Int]())((m,s) => m + (s -> (m.getOrElse(s, 0)+1)))
// produces Map(to -> 2, be -> 2, or -> 1, not -> 1)

Okay so I used “flatMap” instead of “map” and “foldLeft” instead of “reduce”, but you get the idea. Note the par keyword, which invokes Scala’s parallel collections framework, transparently parallelizing the map step. Who needs a compute cluster when you’ve got the Scala REPL?

This is just a little stunt, of course, but if you’re serious about working with MapReduce and Scala the ScalaHadoop project looks like a good place to start.

Advertisements
This entry was posted in Those that have just broken the flower vase and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s