recursive path aggregation and CTE query for top-down tree postgres

Question 1

recursive path aggregation and CTE query for top-down tree postgres

postgresql common-table-expression recursive-query

pyramation · Jun 19, 2017 · Viewed 8.6k times · Source

Answer

Answer

Store the parent of a node instead of its children. It is a simpler and cleaner solution, in which you do not need structured data types.

This is an exemplary model with the data equivalent to that in the question:

create table objects (
    id int primary key,
    parent_id int,
    label text,
    content text);

insert into objects values
(1, 4, 'f', 'data for f'),
(2, 5, 'e', 'data for e'),
(3, 5, 'd', 'data for d'),
(4, 6, 'c', ''),
(5, 6, 'b', ''),
(6, 0, 'a', '');

And a recursive query:

with recursive nodes(id, path, content) as (
    select id, label, content
    from objects
    where parent_id = 0
union all
    select o.id, concat(path, '->', label), o.content
    from objects o
    join nodes n on n.id = o.parent_id
)
select *
from nodes
order by id desc;

 id |  path   |  content   
----+---------+------------
  6 | a       | 
  5 | a->b    | 
  4 | a->c    | 
  3 | a->b->d | data for d
  2 | a->b->e | data for e
  1 | a->c->f | data for f
(6 rows)

The variant with children_ids.

drop table if exists objects;
create table objects (
    id int primary key,
    children_ids int[],
    label text,
    content text);
insert into objects values
(1, null, 'f', 'data for f'),
(2, null, 'e', 'data for e'),
(3, null, 'd', 'data for d'),
(4, array[1], 'c', ''),
(5, array[2,3], 'b', ''),
(6, array[4,5], 'a', '');
with recursive nodes(id, children, path, content) as (
    select id, children_ids, label, content
    from objects
    where id = 6
union all
    select o.id, o.children_ids, concat(path, '->', label), o.content
    from objects o
    join nodes n on o.id = any(n.children)
)
select *
from nodes
order by id desc;

 id | children |  path   |  content   
----+----------+---------+------------
  6 | {4,5}    | a       | 
  5 | {2,3}    | a->b    | 
  4 | {1}      | a->c    | 
  3 |          | a->b->d | data for d
  2 |          | a->b->e | data for e
  1 |          | a->c->f | data for f
(6 rows)

Question 2

I'm trying to write a query to produce a list of all nodes in a tree given a root, and also the paths (using names the parents give their children) taken to get there. The recursive CTE I have working is a textbook CTE straight from the docs here, however, it's proven difficult to get the paths working in this case.

Following the git model, names are given to children by their parents as a result of paths created by traversing the tree. This implies a map to children ids like git's tree structure.

I've been looking online for a solution for a recursive query but they all seem to contain solutions that use parent ids, or materialized paths, which all would break structural sharing concepts that Rich Hickey's database as value talk is all about.

current implementation

Imagine the objects table is dead simple (for simplicity sake, let's assume integer ids):

drop table if exists objects;
create table objects (
    id INT,
    data jsonb
);

--       A
--     /   \
--    B     C
--   /   \    \
--  D     E    F

INSERT INTO objects (id, data) VALUES
  (1, '{"content": "data for f"}'),  -- F
  (2, '{"content": "data for e"}'),  -- E
  (3, '{"content": "data for d"}'),  -- D
  (4, '{"nodes":{"f":{"id":1}}}'),               -- C
  (5, '{"nodes":{"d":{"id":2}, "e":{"id":3}}}'), -- B
  (6, '{"nodes":{"b":{"id":5}, "c":{"id":4}}}')  -- A
  ;

drop table if exists work_tree;
create table work_tree (
    id INT NOT NULL,
    path text,
    ref text,
    data jsonb,
    primary key (ref, id) -- TODO change to ref, path
);

create or replace function get_nested_ids_array(data jsonb) returns int[] as $$
  select array_agg((value->>'id')::int) as nested_id
  from jsonb_each(data->'nodes')
$$ LANGUAGE sql STABLE;

create or replace function checkout(root_id int, ref text) returns void as $$
  with recursive nodes(id, nested_ids, data) AS (
      select id, get_nested_ids_array(data), data
      from objects
      where id = root_id
      union
      select child.id, get_nested_ids_array(child.data), child.data
      from objects child, nodes parent
      where child.id = ANY(parent.nested_ids)
  )
  INSERT INTO work_tree (id, data, ref)
  select id, data, ref from nodes
$$ language sql VOLATILE;

SELECT * FROM checkout(6, 'master');
SELECT * FROM work_tree;

If you are familiar, these objects' data property look similar to git blobs/trees, mapping names to ids or storing content. So imagine you want to create an index, so, after a "checkout", you need to query for the list of nodes, and potentially the paths to produce a working tree or index:

Current Output:

id    path    ref          data
6     NULL    master       {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4     NULL    master       {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5     NULL    master       {"nodes":{"f":{"id":1}}}
1     NULL    master       {"content": "data for d"}
2     NULL    master       {"content": "data for e"}
3     NULL    master       {"content": "data for f"}

Desired Output:

id    path    ref          data
6      /       master      {"nodes":{"b":{"id":5}, "c":{"id":4}}}
4      /b      master      {"nodes":{"d":{"id":2}, "e":{"id":3}}}
5      /c      master      {"nodes":{"f":{"id":1}}}
1      /b/d    master      {"content": "data for d"}
2      /b/e    master      {"content": "data for e"}
3      /c/f    master      {"content": "data for f"}

What's the best way to aggregate path in this case? I'm aware that I'm compressing the information when I call get_nested_ids_array when I do the recursive query, so not sure with this top-down approach how to properly aggregate with the CTE.

EDIT use case for children ids

to explain more about why I need to use children ids instead of parent:

Imagine a data structure like so:

      A
    /   \
   B     C
 /   \    \
D     E    F

If you make a modification to F, you only add a new root A', and children nodes C', and F', leaving the old tree in tact:

     A'    A
   /   \ /   \
  C'    B     C
 /     /   \    \
F'    D     E    F

If you make a deletion, you only add a new root A" that only points to B and you still have A if you ever need to time travel (and they share the same objects, just like git!):

 A"  A
  \ /   \
   B     C
 /   \    \
D     E    F

So it seems that the best way to achieve this is with children ids so children can have multiple parents - across time and space! If you think there's another way to achieve this, by all means, let me know!

Edit #2 case for not using parent_ids

Using parent_ids has cascading effects that requires editing the entire tree. For example,

      A
    /   \
   B     C
 /   \    \
D     E    F

If you make a modification to F, you still need a new root A' to maintain immutability. And if we use parent_ids, then that means both B and C now have a new parent. Hence, you can see how it ripples through the entire tree immediately requiring every node is touched:

      A              A' 
    /   \          /   \
   B     C        B'     C'
 /   \    \      /   \    \
D     E    F    D'    E'   F'

EDIT #3 use case for parents giving names to children

We can make a recursive query where objects store their own name, but the question I'm asking is specifically about constructing a path where the names are given to children from their parents. This is modeling a data structure similar to the git tree, for example if you see this git graph pictured below, in the 3rd commit there is a tree (a folder) bak that points to the original root which represents a folder of all the files at the 1st commit. If that root object had it's own name, it wouldn't be possible to achieve this so simply as adding a ref. That's the beauty of git, it's as simple as making a reference to a hash and giving it a name.

This is the relationship I'm setting up which is why the jsonb data structure exists, it's to provide a mapping from a name to an id (hash in git's case). I know it's not ideal, but it does the job of providing the hash map. If there's another way to create this mapping of names to ids, and thus, a way for parents to give names to children in a top-down tree, I'm all ears!

Any help is appreciated!

recursive path aggregation and CTE query for top-down tree postgres

current implementation

Current Output:

Desired Output:

EDIT use case for children ids

Edit #2 case for not using parent_ids

EDIT #3 use case for parents giving names to children

Answer

Related questions