Find rows that have same value in one column and other values in another column?

Jim picture Jim · Jan 4, 2015 · Viewed 10.5k times · Source

I have a PostgreSQL database that stores users in a users table and conversations they take part in a conversation table. Since each user can take part in multiple conversations and each conversation can involve multiple users, I have a conversation_user linking table to track which users are participating in each conversation:

# conversation_user
id  |  conversation_id | user_id
----+------------------+--------
1   |                1 |      32
2   |                1 |       3
3   |                2 |      32
4   |                2 |       3
5   |                2 |       4

In the above table, user 32 is having one conversation with just user 3 and another with both 3 and user 4. How would I write a query that would show that there is a conversation between just user 32 and user 3?

I've tried the following:

SELECT conversation_id AS cid,
       user_id
FROM conversation_user
GROUP BY cid HAVING count(*) = 2
AND (user_id = 32
     OR user_id = 3);

SELECT conversation_id AS cid,
   user_id
FROM conversation_user
GROUP BY (cid HAVING count(*) = 2
AND (user_id = 32
     OR user_id = 3));

SELECT conversation_id AS cid,
       user_id
FROM conversation_user
WHERE (user_id = 32)
  OR (user_id = 3)
GROUP BY cid HAVING count(*) = 2;

These queries throw an error that says that user_id must appear in the GROUP BY clause or be used in an aggregate function. Putting them in an aggregate function (e.g. MIN or MAX) doesn't sound appropriate. I thought that my first two attempts were putting them in the GROUP BY clause.

What am I doing wrong?

Answer

Erwin Brandstetter picture Erwin Brandstetter · Jan 4, 2015

This is a case of relational division. We have assembled an arsenal of techniques under this related question:

The special difficulty is to exclude additional users. There are basically 4 techniques.

I suggest LEFT JOIN / IS NULL:

SELECT cu1.conversation_id
FROM        conversation_user cu1
JOIN        conversation_user cu2 USING (conversation_id)
LEFT   JOIN conversation_user cu3 ON cu3.conversation_id = cu1.conversation_id
                                 AND cu3.user_id NOT IN (3,32)
WHERE  cu1.user_id = 32
AND    cu2.user_id = 3
AND    cu3.conversation_id IS NULL;

Or NOT EXISTS:

SELECT cu1.conversation_id
FROM   conversation_user cu1
JOIN   conversation_user cu2 USING (conversation_id)
WHERE  cu1.user_id = 32
AND    cu2.user_id = 3
AND NOT EXISTS (
   SELECT 1
   FROM   conversation_user cu3
   WHERE  cu3.conversation_id = cu1.conversation_id
   AND    cu3.user_id NOT IN (3,32)
   );

Both queries do not depend on a UNIQUE constraint for (conversation_id, user_id), which may or may not be in place. Meaning, the query even works if user_id 32 (or 3) is listed more than once for the same conversation. You would get duplicate rows in the result, though, and need to apply DISTINCT or GROUP BY.
The only condition is the one you formulated:

... a query that would show that there is a conversation between just user 32 and user 3?

Audited query

The query you linked in the comment wouldn't work. You forgot to exclude other participants. Should be something like:

SELECT *  -- or whatever you want to return
FROM   conversation_user cu1
WHERE  cu1.user_id = 32
AND    EXISTS (
   SELECT 1
   FROM   conversation_user cu2
   WHERE  cu2.conversation_id = cu1.conversation_id 
   AND    cu2.user_id = 3
   )
AND NOT EXISTS (
   SELECT 1
   FROM   conversation_user cu3
   WHERE  cu3.conversation_id = cu1.conversation_id
   AND    cu3.user_id NOT IN (3,32)
   );

Which is similar to the other two queries, except that it will not return multiple rows if user_id = 3 is linked multiple times.