How to delete rows in hive hadoop database

Sunny picture Sunny · Mar 25, 2014 · Viewed 34.7k times · Source

I'm a newbie with hadoop & hive. I want to delete certain rows in my database - which is on hive-hadoop. I know its not supported out of the box, and that hadoop is a read only file system. I'm curious about what are the best approaches for accomplishing this. If anyone has done this before, can they share their learnings/procedures?

Thanks!

Answer

Jerome Banks picture Jerome Banks · Mar 25, 2014

In Big Data there really aren't deletes. That said, you can overwrite your table or partition if it isn't too big, or isolate your deletes to a particular partition like JamCon suggests.

For datasets which are not too huge, you can do something like

INSERT OVERWRITE TABLE mytable
SELECT * FROM mytable
WHERE ID NOT IN ( 'delete1', 'delete2', 'delete3');