Date comparison in Hive

Jeremiah Peschka picture Jeremiah Peschka · Dec 28, 2012 · Viewed 55.9k times · Source

I'm working with Hive and I have a table structured as follows:

CREATE TABLE t1 (
  id INT,
  created TIMESTAMP,
  some_value BIGINT
);

I need to find every row in t1 that is less than 180 days old. The following query yields no rows even though there is data present in the table that matches the search predicate.

select * 
from t1 
where created > date_sub(from_unixtime(unix_timestamp()), 180);

What is the appropriate way to perform a date comparison in Hive?

Answer

Joe K picture Joe K · Dec 28, 2012

How about:

where unix_timestamp() - created < 180 * 24 * 60 * 60

Date math is usually simplest if you can just do it with the actual timestamp values.

Or do you want it to only cut off on whole days? Then I think the problem is with how you are converting back and forth between ints and strings. Try:

where created > unix_timestamp(date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),180),'yyyy-MM-dd')

Walking through each UDF:

  1. unix_timestamp() returns an int: current time in seconds since epoch
  2. from_unixtime(,'yyyy-MM-dd') converts to a string of the given format, e.g. '2012-12-28'
  3. date_sub(,180) subtracts 180 days from that string, and returns a new string in the same format.
  4. unix_timestamp(,'yyyy-MM-dd') converts that string back to an int

If that's all getting too hairy, you can always write a UDF to do it yourself.