Using UNNEST with a JOIN

İlker İnanç picture İlker İnanç · Apr 17, 2013 · Viewed 19.8k times · Source

I want to be able to use unnest() function in PostgreSQL in a complicated SQL query that has many JOINs. Here's the example query:

SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id
FROM mentions
INNER JOIN taggings ON taggings.mention_id = mentions.id
INNER JOIN tags ON tags.id = taggings.tag_id
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3
GROUP BY tags.parent_id

I want to eliminate the taggings table here, because my mentions table has an integer array field named taglist that consists of all linked tag ids of mentions.

I tried following:

SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id 
FROM mentions 
INNER JOIN tags ON tags.id IN (SELECT unnest(taglist))
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3 
GROUP BY tags.parent_id 

This works but brings different results than the first query.

So what I want to do is to use the result of the SELECT unnest(taglist) in a JOIN query to compensate for the taggings table.

How can I do that?

UPDATE: taglist is the same set as the respective list of tag ids of mention.

Answer

Erwin Brandstetter picture Erwin Brandstetter · Apr 17, 2013

Technically, your query might work like this (not entirely sure about the objective of this query):

SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM  (
    SELECT  unnest(m.taglist) AS tag_id
    FROM    mentions m
    WHERE   m.search_id = 3
    AND     9 = ANY (m.taglist)
    ) m 
JOIN   tags t  USING (tag_id) -- assumes tag.tag_id!
GROUP  BY t.parent_id;

However, it seems to me you are going in the wrong direction here. Normally one would remove the redundant array taglist and keep the normalized database schema. Then your original query should serve well, only shortened the syntax with aliases:

SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM   mentions m
JOIN   taggings mt ON mt.mention_id = m.id
JOIN   tags     t  ON t.id = mt.tag_id
WHERE  9 = ANY (m.taglist)
AND    m.search_id = 3
GROUP  BY t.parent_id;

Unravel the mystery

<rant> The root cause for your "different results" is the unfortunate naming convention that some intellectually challenged ORMs impose on people.
I am speaking of id as column name. Never use this anti-pattern in a database with more than one table. Right, that means basically any database. As soon as you join a bunch of tables (that's what you do in a database) you end up with a bunch of columns named id. Utterly pointless.
The ID column of a table named tag should be tag_id (unless there is another descriptive name). Never id. </rant>

Your query inadvertently counts tags instead of mentions:

SELECT 25 AS keyword_id, count(m.id) AS total, t.parent_id AS tag_id
FROM  (
    SELECT unnest(m.taglist) AS id
    FROM   mentions m
    WHERE  m.search_id = 4
    AND    25 = ANY (m.taglist)
    ) m
JOIN   tags t USING (id)
GROUP  BY t.parent_id;

It should work this way:

SELECT 25 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id
FROM  (
    SELECT m.id, unnest(m.taglist) AS tag_id
    FROM   mentions m
    WHERE  m.search_id = 4
    AND    25 = ANY (m.taglist)
    ) m
JOIN   tags t ON t.id =  m.tag_id
GROUP  BY t.parent_id;

I also added back the DISTINCT to your count() that got lost along the way in your query.