how to write case and group by in hive query

priyanka picture priyanka · May 25, 2016 · Viewed 74.3k times · Source

This is my hive table:

course   dept    subject   status

btech     cse     java     pass
btech     cse     hadoop   fail
btech     cse     cg       detained
btech     cse     cc       pass
btech      it     daa      pass
btech      it     wt       pass
btech      it     cnn      pass
mba        hr     hrlaw    pass
mba        hr     hrguid   absent
mtech      cs     java     pass
mtech      cs     cd       pass
mtech      cs     cp       detained

I want to query this table to retrieve data in the following way:

course   dept    status

btech     cse     fail
btech      it     pass
mba        hr     absent
mtech      cs     fail

First, it will check for "fail" or "detained" in the status of each dept and course grouped together. If it finds "fail" or "detained", it will output "fail" as the status. Else, if an "absent" is found in the same group, it will output "absent" as the status. Else, it will output "pass".

I got an error message when I ran the following query:

select course,dept,
case 
when status in ( 'fail','detained') then 'fail'
when status in ( 'absent') then 'absent'
when status in ( 'pass') then 'pass'
else null 
end as Final_Status
from college
group by course,dept;

Answer

b1n0ys picture b1n0ys · May 26, 2016

When you group by course and dept, you would get multiple values (comming for different records) for status column, this needs to be handled.
Any column in the select which is not part of group by should be within an aggregate function
here is a solution using sum() function.

select course, dept,
    case when sum(case when status in ( 'fail','detained') then 1 else 0 end) > 0 then 'fail'
         when sum(case when status in ('absent') then 1 else 0 end) > 0 then 'absent'
         when sum(case when status in ('pass') then 1 else 0 end) > 0 then 'pass'
         else 'no_result'
    end as final_status
from college
group by 
    course,dept