count number of rows that occur for each date in column date range

Will Dieterich picture Will Dieterich · Sep 8, 2009 · Viewed 9k times · Source

I have a table with data such as below

Group       Start Date        End Date
A        01/01/01       01/03/01
A       01/01/01        01/02/01
A       01/03/01        01/04/01
B       01/01/01        01/01/01
ETC

I am looking to produce a view that gives a count for each day, like this

Group       Date        Count
A       01/01/01            2
A       01/02/01            2
A       01/03/01            2
A       01/04/01         1
B       01/01/01            1

I am using Oracle 9 and am at a total loss on what how to handle this and am looking for any idea to get me started.
Note: Generating a table to hold the dates is not practical because I final product has to break down to the minute.

Answer

Quassnoi picture Quassnoi · Sep 8, 2009
WITH    q AS
        (
        SELECT  (
                SELECT  MIN(start_date)
                FROM    mytable
                ) + level - 1 AS mydate
        FROM    dual
        CONNECT BY
                level <= (
                SELECT  MAX(end_date) - MIN(start_date)
                FROM    mytable
                )
        )
SELECT  group, mydate,
        (
        SELECT  COUNT(*)
        FROM    mytable mi
        WHERE   mi.group = mo.group
                AND q BETWEEN mi.start_date AND mi.end_date
        ) 
FROM    q
CROSS JOIN
        (
        SELECT  DISTINCT group
        FROM    mytable
        ) mo

Update:

A better and faster query making use of analytic functions.

The main idea is that the number of ranges containing each date is the difference before the count of ranges started before that date and the count of ranges that ended before it.

SELECT  cur_date,
        grouper,
        SUM(COALESCE(scnt, 0) - COALESCE(ecnt, 0)) OVER (PARTITION BY grouper ORDER BY cur_date) AS ranges
FROM    (
        SELECT  (
                SELECT  MIN(start_date)
                FROM    t_range
                ) + level - 1 AS cur_date
        FROM    dual
        CONNECT BY
                level <=
                (
                SELECT  MAX(end_date)
                FROM    t_range
                ) -
                (
                SELECT  MIN(start_date)
                FROM    t_range
                ) + 1
        ) dates
CROSS JOIN
        (
        SELECT  DISTINCT grouper AS grouper
        FROM    t_range
        ) groups
LEFT JOIN
        (
        SELECT  grouper AS sgrp, start_date, COUNT(*) AS scnt
        FROM    t_range
        GROUP BY
                grouper, start_date
        ) starts
ON      sgrp = grouper
        AND start_date = cur_date
LEFT JOIN
        (
        SELECT  grouper AS egrp, end_date, COUNT(*) AS ecnt
        FROM    t_range
        GROUP BY
                grouper, end_date
        ) ends
ON      egrp = grouper
        AND end_date = cur_date - 1
ORDER BY
        grouper, cur_date

This query completes in 1 second on 1,000,000 rows.

See this entry in my blog for more detail: