Mixing together Connect by, inner join and sum with Oracle

François picture François · May 27, 2010 · Viewed 22.3k times · Source

I need help with a oracle query.

Here is my setup:

I have 2 tables called respectively "tasks" and "timesheets". The "tasks" table is a recursive one, that way each task can have multiple subtasks. Each timesheet is associated with a task (not necessarily the "root" task) and contains the number of hours worked on it.

Example:

Tasks

id:1 | name: Task A | parent_id: NULL

id:2 | name: Task A1 | parent_id: 1

id:3 | name: Task A1.1 | parent_id: 2

id:4 | name: Task B | parent_id: NULL

id:5 | name: Task B1 | parent_id: 4

Timesheets

id:1 | task_id: 1 | hours: 1

id:2 | task_id: 2 | hours: 3

id:3 | task_id:3 | hours: 1

id:5 | task_id:5 | hours:1 ...

What I want to do:

I want a query that will return the sum of all the hours worked on a "task hierarchy". If we take a look at the previous example, It means I would like to have the following results:

task A - 5 hour(s) | task B - 1 hour(s)

At first I tried this

SELECT TaskName, Sum(Hours) "TotalHours" 
FROM (
    SELECT replace(sys_connect_by_path(decode(level, 1, t.name), '~'), '~') As TaskName, 
    ts.hours as hours
    FROM tasks t INNER JOIN timesheets ts ON t.id=ts.task_id
    START WITH PARENTOID=-1
    CONNECT BY PRIOR t.id = t.parent_id
    )
GROUP BY TaskName Having Sum(Hours) > 0 ORDER BY TaskName

And it almost work. THe only problem is that if there are no timesheet for a root task, it will skip the whole hieararchy... but there might be timesheets for the child rows and it is exactly what happens with Task B1. I know it is the "inner join" part that is causing my problem but I'm not sure how can I get rid of it.

Any idea how to solve this problem?

Thank you

Answer

aw crud picture aw crud · May 27, 2010

Would something like this work? I've had cases similar to yours, and I simply removed the join from the hierarchical query and applied it only afterward to avoid losing rows.

SELECT TaskName, Sum(ts.hours) "TotalHours" 
FROM (
    SELECT replace(sys_connect_by_path(decode(level, 1, t.name), '~'), '~') As TaskName, t.id
    FROM tasks t
    START WITH PARENTOID=-1
    CONNECT BY PRIOR t.id = t.parent_id
    ) tasks
INNER JOIN timesheets ts ON tasks.id=ts.task_id
GROUP BY TaskName Having Sum(ts.hours) > 0 ORDER BY TaskName