I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.
When I retrieve them, though, I want to split the ranges at day boundaries - so for example:
2013-01-01 00:00:00 to 2013-01-02 23:59:59
would be selected as two rows:
2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59
with the values in the other columns the same for both retrieved entries.
I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.
I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY
statement is an Oracle extension to the SQL standard, so I can't use it.
I believe I can achieve this using PostgreSQL's generate_series
, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.
This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM
table in a joined subquery), but I believe this is more-or-less the right track.
Here's the fiddle with the schema, sample data, and my working query.
Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT
part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.
First off, your upper border concept is broken. A timestamp with 23:59:59
is no good. The data type timestamp
has fractional digits. What about 2013-10-18 23:59:59.123::timestamp
?
Include the lower border and exclude the upper border everywhere in your logic. Compare:
Building on this premise:
SELECT id
, stime
, etime
FROM timesheet_entries t
WHERE etime <= stime::date + 1 -- this includes upper border 00:00
UNION ALL
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END -- AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
SELECT id
, stime
, etime
, generate_series(stime::date, etime::date, interval '1d')::date AS d
FROM timesheet_entries t
WHERE etime > stime::date + 1
) sub
ORDER BY id, stime;
Or simply:
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END -- AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
SELECT id
, stime
, etime
, generate_series(stime::date, etime::date, interval '1d')::date AS d
FROM timesheet_entries t
) sub
ORDER BY id, stime;
The simpler one may even be faster.
Note a corner case difference when stime
and etime
both fall on 00:00
exactly. Then a row with a zero time range is added at the end. There are various ways to deal with that. I propose:
SELECT *
FROM (
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM (
SELECT id
, stime
, etime
, generate_series(stime::date, etime::date, interval '1d')::date AS d
FROM timesheet_entries t
) sub1
ORDER BY id, stime
) sub2
WHERE etime <> stime;
In Postgres 9.3+ you would better use LATERAL
for this
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM timesheet_entries t
, LATERAL (SELECT d::date
FROM generate_series(t.stime::date, t.etime::date, interval '1d') d
) d
ORDER BY id, stime;
Details in the manual.
Same corner case as above.
SQL Fiddle demonstrating all.