For my problem, we have a schema whereby one photo has many tags and also many comments. So if I have a query where I want all the comments and tags, it will multiply the rows together. So if one photo has 2 tags and 13 comments, I get 26 rows for that one photo:
SELECT
tag.name,
comment.comment_id
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
That's fine for most things, but it means that if I GROUP BY
and then json_agg(tag.*)
, I get 13 copies of the first tag, and 13 copies of the second tag.
SELECT json_agg(tag.name) as tags
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
Instead I want an array that is only 'suburban' and 'city', like this:
[
{"tag_id":1,"name":"suburban"},
{"tag_id":2,"name":"city"}
]
I could json_agg(DISTINCT tag.name)
, but this will only make an array of tag names, when I want the entire row as json. I would like to json_agg(DISTINCT ON(tag.name) tag.*)
, but that's not valid SQL apparently.
How then can I simulate DISTINCT ON
inside an aggregate function in Postgres?
Whenever you have a central table and want to left-join it to many rows in table A and also left-join it to many rows in table B, you get these problems of duplicating rows. It can especially throw off aggregrate functions like COUNT
and SUM
if you're not careful! So I think you need to construct your tags-for-each-photo and comments-for-each-photo separately, and then join them together:
WITH tags AS (
SELECT photo.photo_id, json_agg(row_to_json(tag.*)) AS tags
FROM photo
LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
),
comments AS (
SELECT photo.photo_id, json_agg(row_to_json(comment.*)) AS comments
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
GROUP BY photo.photo_id
)
SELECT COALESCE(tags.photo_id, comments.photo_id) AS photo_id,
tags.tags,
comments.comments
FROM tags
FULL OUTER JOIN comments
ON tags.photo_id = comments.photo_id
EDIT: If you really want to join everything together without CTEs, this looks like it gives correct results:
SELECT photo.photo_id,
to_json(array_agg(DISTINCT tag.*)) AS tags,
to_json(array_agg(DISTINCT comment.*)) AS comments
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id