PostgreSQL splitting time range into days

CmdrMoozy picture CmdrMoozy · Oct 18, 2013 · Viewed 7.3k times · Source

I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.

When I retrieve them, though, I want to split the ranges at day boundaries - so for example:

2013-01-01 00:00:00 to 2013-01-02 23:59:59

would be selected as two rows:

2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59

with the values in the other columns the same for both retrieved entries.

I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.

I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY statement is an Oracle extension to the SQL standard, so I can't use it.

I believe I can achieve this using PostgreSQL's generate_series, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.

This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM table in a joined subquery), but I believe this is more-or-less the right track.

Here's the fiddle with the schema, sample data, and my working query.

Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.

Answer

Erwin Brandstetter picture Erwin Brandstetter · Oct 18, 2013

First off, your upper border concept is broken. A timestamp with 23:59:59 is no good. The data type timestamp has fractional digits. What about 2013-10-18 23:59:59.123::timestamp?

Include the lower border and exclude the upper border everywhere in your logic. Compare:

Building on this premise:

Postgres 9.2 or older

SELECT id
     , stime
     , etime
FROM   timesheet_entries t
WHERE  etime <= stime::date + 1  -- this includes upper border 00:00

UNION ALL
SELECT id
     , CASE WHEN stime::date = d THEN stime ELSE d END     -- AS stime
     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
   SELECT id
        , stime
        , etime
        , generate_series(stime::date, etime::date, interval '1d')::date AS d
   FROM   timesheet_entries t
   WHERE  etime > stime::date + 1
   ) sub
ORDER  BY id, stime;

Or simply:

SELECT id
     , CASE WHEN stime::date = d THEN stime ELSE d END     -- AS stime
     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
   SELECT id
        , stime
        , etime
        , generate_series(stime::date, etime::date, interval '1d')::date AS d
   FROM   timesheet_entries t
   ) sub
ORDER  BY id, stime;

The simpler one may even be faster.
Note a corner case difference when stime and etime both fall on 00:00 exactly. Then a row with a zero time range is added at the end. There are various ways to deal with that. I propose:

SELECT *
FROM  (
   SELECT id
        , CASE WHEN stime::date = d THEN stime ELSE d END     AS stime
        , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
   FROM (
      SELECT id
           , stime
           , etime
           , generate_series(stime::date, etime::date, interval '1d')::date AS d
      FROM   timesheet_entries t
      ) sub1
   ORDER  BY id, stime
   ) sub2
WHERE  etime <> stime;

Postgres 9.3+

In Postgres 9.3+ you would better use LATERAL for this

SELECT id
     , CASE WHEN stime::date = d THEN stime ELSE d END     AS stime
     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM   timesheet_entries t
     , LATERAL (SELECT d::date
                FROM   generate_series(t.stime::date, t.etime::date, interval '1d') d
                ) d
ORDER  BY id, stime;

Details in the manual.
Same corner case as above.

SQL Fiddle demonstrating all.