KeyValueTextInputFormat Removed from the New Hadoop API

The new Hadoop API no longer contains a KeyValueTextInputFormat subclass of FileInputFormat. Now if you want to write a Hadoop job that reads raw tab-delimited text files, you have to use plain old FileInputFormat, ignore the LongWritable key, and split the Text value on tab yourself. This isn’t hard to do, but it is confusing if you’re porting older code or working from any of the books currently in print, which only cover the old API, because this format was yanked out instead of being deprecated, and I can’t find mention of the change in any of the porting guides I’ve managed to dig up. So if all of a sudden KeyValueTextInputFormat no longer works for you, rest assured that you’re not doing anything wrong.

Advertisements
This entry was posted in Those that have just broken the flower vase. Bookmark the permalink.

5 Responses to KeyValueTextInputFormat Removed from the New Hadoop API

  1. Harsh says:

    Hello,

    Apache Hadoop 0.20.2 did not ship out with its newer API complete. It lacks a lot of components that are otherwise in the stable API. They indeed decided that deprecating the stable API was wrong for that release and have undid it for the upcoming 0.20.3, hopefully reducing confusion for developers.

    That said, the KeyValueTextInputFormat is available in the trunk as well as in the 0.21 release that followed the 0.20. It hasn’t been removed, merely not ported back then (and thus came in later — am not completely sure if it will also be present in the 0.20.3, issues.apache.org should be able to tell you more on that).

    FWIW, Cloudera’s distro including Apache Hadoop (CDH3) carries this new API component in it.

  2. Aziz says:

    Thank you for this info. Indeed those, who are using hadoop 0.20.1 are in somewhat middle position, while some new API is available, you cannot use some others. This makes those users to migrate to 0.21 which is not easy process, though.

  3. Contributor says:

    I was hitting the same issue as documented on your page. I did the work-around of manual parsing and it worked. However, after further review, I found my driver class was missing this line:

    job.setInputFormatClass(KeyValueTextInputFormat.class);

    Once added, the Map job worked as expected with Text, Text input.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s