Hive unable to manually set number of reducers

magicalo picture magicalo · Jan 6, 2012 · Viewed 44.2k times · Source

I have the following hive query:

select count(distinct id) as total from mytable;

which automatically spawns:
1408 Mappers
1 Reducer

I need to manually set the number of reducers and I have tried the following:

set mapred.reduce.tasks=50 
set hive.exec.reducers.max=50

but none of these settings seem to be honored. The query takes forever to run. Is there a way to manually set the reducers or maybe rewrite the query so it can result in more reducers? Thanks!

Answer

wlk picture wlk · Jan 7, 2012

writing query in hive like this:

 SELECT COUNT(DISTINCT id) ....

will always result in using only one reducer. You should:

  1. use this command to set desired number of reducers:

    set mapred.reduce.tasks=50

  2. rewrite query as following:

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM ... ) t;

This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.