SQL Query to Join Two Tables Based Off Closest Timestamp

Topher Fangio picture Topher Fangio · Nov 1, 2010 · Viewed 21.8k times · Source

I have two tables in SQL and I need to be able to do a join based off of the timestamp in table B that is earlier than or equal to the timestamp in table A.

So, here is some fake data for two tables and the desired output:

Closed Cases (Table A)

| id | resolution |         timestamp          |
------------------------------------------------
|  1 |     solved | 2006-10-05 11:55:44.888153 |
|  2 |     closed | 2007-10-07 12:34:17.033498 |
|  3 |    trashed | 2008-10-09 08:19:36.983747 |
|  4 |     solved | 2010-10-13 04:28:14.348753 |

Classification (Table B)


| id |    value    |         timestamp          |
-------------------------------------------------
|  1 |    freshman | 2006-01-01 12:02:44.888153 |
|  2 |   sophomore | 2007-01-01 12:01:19.984333 |
|  3 |      junior | 2008-01-01 12:02:28.746149 |

Desired Results

| id | resolution |         timestamp          |    value    |
--------------------------------------------------------------
|  1 |     solved | 2006-10-05 11:55:44.888153 |    freshman |
|  2 |     closed | 2007-10-07 12:34:17.033498 |   sophomore |
|  3 |    trashed | 2008-10-09 08:19:36.983747 |      junior |
|  4 |     solved | 2010-10-13 04:28:14.348753 |      junior |

So, I know the code needs to look like the following, I just can't figure out what to do with the ON portion of the JOIN ($1 and $2 are variables that will be passed in):

SELECT case.id, case.resolution, case.timestamp, class.value
  FROM closed_cases AS case
  LEFT JOIN classifications AS class ON ???
  WHERE case.timestamp BETWEEN $1 AND $2;

I know I could use a sub-select, but this will be operating on at least a few thousand rows, probably more, and I need it to be really fast; so I was hoping for a simple clause that could do it.

Answer

user359040 picture user359040 · Nov 1, 2010

If you can make changes to the table structures, I recommend changing the classification table to include an end date as well as a start date - it will be much easier to join to the table that way.

If not, I suggest the following:

SELECT case.id, case.resolution, case.timestamp, class.value
  FROM closed_cases AS case
  LEFT JOIN (select c.*, 
                    (select min(timestamp)
                     from classifications c1
                      where c1.timestamp > c.timestamp) timeend
             from classifications c) AS class 
  ON case.timestamp >= class.timestamp and 
     (case.timestamp < class.timeend or class.timeend IS NULL)
  WHERE case.timestamp BETWEEN $1 AND $2;

EDIT - with the end date on classification:

SELECT case.id, case.resolution, case.timestamp, class.value
  FROM closed_cases AS case
  LEFT JOIN classifications AS class 
  ON case.timestamp >= class.timestamp and case.timestamp < class.timeend
  WHERE case.timestamp BETWEEN $1 AND $2;