You should endeavor to write your reducers so that they complete in a reasonable amount of time. What reasonable means depends on the size of your cluster and the patience of your co-workers, but a good rule of thumb is maybe twenty minutes. That’s long enough to amortize any spin-up overhead but fast enough not to starve out other people’s jobs.
That’s reducer time. As for reducer quantity, the general wisdom is to request something near the number of available slots. It’s fine if everyone else is doing the same thing, because fair apportioning of resources is the framework’s job. Within normal operating parameters, you should just launch ’em all and let Hadoop sort ’em out.
But what if your reducers run for longer than twenty minutes–like, say, a few hours each? Well, you might think about how to divide up the work more finely, since you might be entering a region of non-scalability. However, that may come at the cost of complicating your algorithm. There are times when it would be best to leave your solution as it is and just find some knob you can turn to make it play nice with the other jobs on the cluster.
One trick: set
mapred.reduce.tasks to an absurdly large number, something much larger than the number of available reducer slots. Of course you don’t actually get that number of slots because no blood from a stone, but this will force Hadoop to make the input for each reducer task smaller. Altogether the reducers take the same amount of time, but each individual attempt is quicker, and the improved time granularity keeps your job from hogging the cluster. I resisted doing this for a long time because it seemed like cheating, but it works.