PostgreSQL where all in array

pex picture pex · Jul 13, 2012 · Viewed 20.4k times · Source

What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN? After all it should behave like mongodb's $all.

Thinking about group conversations where conversation_users is a join table between conversation_id and user_id I have something like this in mind:

WHERE (conversations_users.user_id ALL IN (1,2))

UPDATE 16.07.12

Adding more info about schema and case:

  1. The join-table is rather simple:

                  Table "public.conversations_users"
         Column      |  Type   | Modifiers | Storage | Description 
    -----------------+---------+-----------+---------+-------------
     conversation_id | integer |           | plain   | 
     user_id         | integer |           | plain   | 
    
  2. A conversation has many users and a user belongs to many conversations. In order to find all users in a conversation I am using this join table.

  3. In the end I am trying to figure out a ruby on rails scope that find's me a conversation depending on it's participants - e.g.:

    scope :between, ->(*users) {
      joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id))
    }
    

UPDATE 23.07.12

My question is about finding an exact match of people. Therefore:

Conversation between (1,2,3) won't match if querying for (1,2)

Answer

Alex Blakemore picture Alex Blakemore · Jul 21, 2012

Assuming the join table follows good practice and has a unique compound key defined, i.e. a constraint to prevent duplicate rows, then something like the following simple query should do.

select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2

It's important to note that the number 2 at the end is the length of the list of user_ids. That obviously needs to change if the user_id list changes length. If you can't assume your join table doesn't contain duplicates, change "count(*)" to "count(distinct user_id)" at some possible cost in performance.

This query finds all conversations that include all the specified users even if the conversation also includes additional users.

If you want only conversations with exactly the specified set of users, one approach is to use a nested subquery in the where clause as below. Note, first and last lines are the same as the original query, only the middle two lines are new.

select conversation_id from conversations_users where user_id in (1, 2)
   and conversation_id not in
   (select conversation_id from conversations_users where user_id not in (1,2))
group by conversation_id having count(*) = 2

Equivalently, you can use a set difference operator if your database supports it. Here is an example in Oracle syntax. (For Postgres or DB2, change the keyword "minus" to "except.)

select conversation_id from conversations_users where user_id in (1, 2)
  group by conversation_id having count(*) = 2
minus
  select conversation_id from conversations_users where user_id not in (1,2)

A good query optimizer should treat the last two variations identically, but check with your particular database to be sure. For example, the Oracle 11GR2 query plan sorts the two sets of conversation ids before applying the minus operator, but skips the sort step for the last query. So either query plan could be faster depending on multiple factors such as the number of rows, cores, cache, indices etc.