I am trying to improve performance for a proof of concept I have already written and am having no luck. I think the approach is probably flawed, but I’m struggling to find another solution. I’ve covered all the Ask Tom articles and forum posts I can find.
We’re running Oracle 10g R2.
We have items arranged in a hierarchical structure. Quantities are defined on the relationships. There are two types of objects in the hierarchy: assemblies that are logical groupings, and items that represent an actual item. So if we were representing a full tool set we would have a root representing the whole tool set, and a leaf that represent an actual tool. So:
tool set -> screw drivers -> flat head screw drivers -> small flat head screw driver
The assemblies can be reused in the hierarchy, as can the items.
I need to flatten the hierarchy so each instance of an item has a row, and the quantity. Any of the relationships can have a quantity >= 1. To get the quantity of an item we need to get the product of the quantities from all the relationships from the root to the leaf.
My solution works, but it does not scale well. Running against actual data it is taking about 8 minutes to produce 6000+ rows, and we have hierarchies that would produce 50k+ rows. Ideally this would be completed in 10 seconds or less, but I know that’s… optimistic ;)
My solution and simplified dataset is below. Any feedback will be much appreciated!
CREATE TABLE ITEMHIER
(
PARENT VARCHAR2(30 BYTE),
CHILD VARCHAR2(30 BYTE),
QUANTITY NUMBER(15,2),
ISLEAF NUMBER
);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM001',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM002',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM003',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM002',10,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM004',3,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM005',12,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM006',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM006',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM005',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY005',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY006',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY008',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY005',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY006',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY002',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY003',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY004',1,0);
COMMIT;
/
CREATE OR REPLACE FUNCTION GETQTY(P_NAVPATH IN VARCHAR2,
P_STARTWITH IN VARCHAR2) RETURN INTEGER AS
R_QTY INTEGER;
BEGIN
SELECT EXP(SUM(LN(QUANTITY)))
INTO R_QTY
FROM (
SELECT QUANTITY, SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH
FROM ITEMHIER
START WITH PARENT = P_STARTWITH
CONNECT BY PRIOR CHILD = PARENT
)
WHERE INSTR(P_NAVPATH, NAV_PATH) = 1;
RETURN R_QTY;
END;
/
SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;
----EDIT
Using the WITH
clause I was able to cut processing time in about 1/2, which is a great gain! Any other ideas?
with
h as (
select sys_connect_by_path(child,'/') navpath,
child,
quantity qty,
isleaf
from itemhier
start with parent = 'ASSY001'
connect by prior child = parent
)
select h1.navpath,
h1.child,
(SELECT exp(sum(ln(h2.qty)))
FROM h h2
WHERE instr(h1.navpath, h2.navpath) = 1) qty
from h h1
where isleaf = 1
EDIT 2
jonearles suggestion to use the sys_connect_by_path to build an arithmetic expression, then use PL/SQL to evaluate it appears to be the way to go. Running against my largest dataset I was able to produce 77k rows of output in 55 seconds.
I also attempted to use parallelism, but as he noted there was little to no performance gain to be had.
Podiluska's suggestion is good. If you have Oracle 11g R2, common table expressions are the way to go. The recursive nature of the new syntax will allow you to ditch the sys_connect_by_path
combined with instr
, which is going to seriously hurt your performance.
Try this:
select
child,
sum(total_quantity) total_quantity
from (
with h (parent, child, isleaf, quantity, total_quantity) as (
select
parent,
child,
isleaf,
quantity,
quantity total_quantity
from
itemhier
where
parent = 'ASSY001'
union all
select
ih.parent,
ih.child,
ih.isleaf,
ih.quantity,
ih.quantity * h.total_quantity total_quantity
from
itemhier ih
join
h on h.child = ih.parent
)
select * from h
where isleaf = 1
)
group by child;
Here's the sqlfiddle: http://sqlfiddle.com/#!4/9840f/6