I have three tables:
CREATE TABLE foo (
id bigint PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE foo_bar (
id bigint PRIMARY KEY,
foo_id bigint NOT NULL
);
CREATE TABLE tag (
name text NOT NULL,
target_id bigint NOT NULL,
PRIMARY KEY (name, target_id)
);
I'm trying to create a view such that I get all of the fields of table foo
, the count of items in foo_bar
where foo.id = foo_bar.foo_id
, and a text array of all tags where foo.id = tag.target_id
. If we have:
INSERT INTO foo VALUES (1, 'one');
INSERT INTO foo VALUES (2, 'two');
INSERT INTO foo_bar VALUES (1, 1);
INSERT INTO foo_bar VALUES (2, 1);
INSERT INTO foo_bar VALUES (3, 2);
INSERT INTO foo_bar VALUES (4, 1);
INSERT INTO foo_bar VALUES (5, 2);
INSERT INTO tag VALUES ('a', 1);
INSERT INTO tag VALUES ('b', 1);
INSERT INTO tag VALUES ('c', 2);
The result should return:
foo.id | foo.name | count | array_agg
--------------------------------------------------
1 | one | 3 | {a, b}
2 | two | 2 | {c}
This is what I have so far:
SELECT DISTINCT f.id, f.name, COUNT(b.id), array_agg(t.name)
FROM foo AS f, foo_bar AS b, tag AS t
WHERE f.id = t.target_id AND f.id = b.foo_id
GROUP BY f.id, b.id;
These are the results I'm getting (note the count
is incorrect):
foo.id | foo.name | count | array_agg
--------------------------------------------------
1 | one | 2 | {a, b}
2 | two | 1 | {c}
The count
is always the count of tags instead of the count of distinct foo_bar
values. I've tried reordering/modifying the GROUP BY
and the SELECT
clauses which returns different results but not the ones that I'm looking for. I think I'm having trouble with the array_agg()
function, but I'm not sure if that's the case or how to resolve it.
SELECT f.id, f.name, b.fb_ct, t.tag_names
FROM foo f
LEFT JOIN (
SELECT foo_id AS id, count(*) AS fb_ct
FROM foo_bar
GROUP BY 1
) b USING (id)
LEFT JOIN (
SELECT target_id AS id, array_agg(name) AS tag_names
FROM tag
GROUP BY 1
) t USING (id)
ORDER BY f.id;
Produces the desired result.
Rewrite with explicit JOIN
syntax. Makes it so much easier to read and understand (and debug).
By joining to multiple 1:n
related tables, rows would multiply each other producing a Cartesian product - which is very expensive nonsense. It's an unintended CROSS JOIN
by proxy. Related:
To avoid this, join at most one n
-table to the 1
-table before you aggregate (GROUP BY
). You could aggregate two times, but it's cleaner and faster to aggregate n
-tables separately before joining them to the 1
-table.
As opposed to your original (with implicit INNER JOIN
). I use LEFT JOIN
to avoid losing rows from foo
that have no matching row in foo_bar
or tag
.
Once the unintended CROSS JOIN
is removed from the query, there is no need for adding DISTINCT
any more - assuming that foo.id
is unique.