Get all rows matching a list in a SQL query

Anders picture Anders · Jan 28, 2014 · Viewed 11.5k times · Source

I have the following SQL query that selects any row matching ANY of the values in the list (9, 10):

SELECT
     r.id, r.title, 
     u.name as 'Created By',
     c.name as 'Category',
     c.value,
     cr.category_id
FROM
     category_resource cr
          INNER JOIN resource r
          ON cr.resource_id = r.id
               INNER JOIN user u
               ON r.created_by = u.id
               INNER JOIN category c
               ON cr.category_id = c.id
WHERE cr.category_id IN ('10', '9');

I have tried finding out how to do the opposite, which I also need, i.e select rows that match ALL values.

I have read about using a statement sort of like this:

SELECT
    r.id, r.title
FROM
    resource r
WHERE
    id IN (
        SELECT
            resource_id
        FROM
            category_resource
        WHERE
            category_id IN (9, 10)
        GROUP BY
            resource_id
        HAVING
            COUNT(DISTINCT category_id) = 2
    );

This is my attempt at adapting this answer to my needs: SQL Server - select rows that match all items in a list

But that doesn't get me as much information back in the results as the first statement does. So how could I do something that is more equivalent? I've tried to put it together, but I'm too new to SQL to figure it out right, I just get errors...

LONG UPDATE:

Gordon Linoff pointed out that this is a strange request. And I know, I feel that it’s strange too, to have a query that returns multiple rows for the same resource. But I don’t know how to do this any other way, maybe I’m on the wrong track altogether, because it just hit me that the original request (the one getting all rows for resources with categories matching ANY in the list) does quite not fulfill my requirement either…

Here’s my requirement as a whole:

First of all, I think the model of this part of the db might be helpful.

enter image description here

(BTW category has a relation with itself also because it is stored as a hierarchy, using the adjacency model, where each category stores its parent id, if anyone was wondering about that arrow...)

1: Find all resources where a category matches ANY of the values in the list. But (this is where that was insufficient) for each of these resources, I need to know the resource along with all of the categories it has…

Let me explain the point of this with a simple example:

This is a many to many relationship as you can see. A resource (e.g let’s say with the title ”Introduction to carpentry in 18th century New England”) can be associated with many categories (e.g category.name = ”subject” value = ”Carpentry”, category.name=”subject” value = ”Wood”, category.name=”subject” value = ”New England”, category.name=”subject” value = ”History”). Note that this example is simplified, but you see the basic idea.

Now, if a user searches for a resource matching ANY of the categories ”Carpentry” and ”Painting”, the resource ”Introduction to carpentry in 18th century New England” should show up in the results, since one of its categories matched. But, here’s the rub, and why Gordon felt my request was strange: in the search result I want to present to the user, I want to list the title ”Introduction to carpentry in 18th century New England” along with a column showing all the categories that title is classified with, even though the user didn’t search for them - in order to give a better overview of the complete subject matter of this resource.

So how can I do that? The only way I could think of was the first statement in my question, but as I said it just hit me that it doesnt’ give me all categories a resource might have, only the ones actually searched for…

Of course I could do a query for the results first, only getting one row for each. And then do a second query, finding all categories for each resource in the results. But if the first query gives me 1000 results (which will be common), then to get the categories for all of these I would have to do 1000 queries to get the categories for each… Sounds like that would give me performance issues…

Am I thinking about this the wrong way? Is there another way to accomplish what I want to do? I.e give me the resources the query selects, along with all of that resource’s associated categories...

2: Well, after that long explanation, the second requirement is simpler to explain: again the same thing about getting all categories back for a resource selected, but this time the selection in the query should only get those resources that match ALL of the provided values. However, just because I provide all of the values in the query again doesn’ mean that I already have all the categories, since each resource in the results may actually have more (and other) categories and I need those too when presenting the results as mentioned in the first (ANY) requirement.

Answer

Gabriele Petrioli picture Gabriele Petrioli · Jan 28, 2014

Update 2 speed issues

A speed improvement (to avoid executing the subquery for each row) is to create a temporary table with the resource id that match the subquery and use that in the main query by joining against it.

/*Create a temporary table with the ids we want (the subquery)*/
CREATE TEMPORARY TABLE Matching_Resources (INDEX(resource_id))
AS (
  SELECT
    resource_id
  FROM
    category_resource
  WHERE
    category_id IN (4,1)
  GROUP BY
    resource_id
  HAVING
    COUNT(DISTINCT category_id) = 2
);

SELECT
  r.id, r.title,
  u.name AS 'Created By',
  GROUP_CONCAT( CONCAT('[',c.name,',',c.value,',',CAST(c.id as CHAR),']') separator ' // ') AS 'Categories'
FROM
  resource r
  INNER JOIN Matching_Resources mr
    ON r.id = mr.resource_id
  INNER JOIN category_resource cr
    ON r.id = cr.resource_id
  INNER JOIN category c
    ON cr.category_id = c.id
  INNER JOIN user u
    ON r.created_by = u.id
GROUP BY r.id

Update 1 some comments

In both cases you want the category filtering to act just as a filter for the matching resource ids only. So you need to make it a subquery in order to avoid affecting the main query which needs to only restrict resources but return all matching categories.

So WHERE r.id IN (..) part must exist in both solutions. You already know how to do the filtering in there (as i am only using the same code you provided)


For the requirement of matching ANY provided category

SELECT 
     r.id, r.title, 
     u.name as 'Created By',
     c.name as 'Category',
     c.value,
     cr.category_id
FROM 
  resource r
  INNER JOIN category_resource cr
    ON r.id = cr.resource_id
  INNER JOIN category c
    ON cr.category_id = c.id
  INNER JOIN user u
    ON r.created_by = u.id
WHERE 
  r.id IN 
    (
      SELECT
        resource_id
      FROM
        category_resource
      WHERE
        category_id IN (6,1)
    )

Demo at http://sqlfiddle.com/#!3/d9486/8/0


For the requirement of matching ALL provided categories

SELECT 
     r.id, r.title, 
     u.name as 'Created By',
     c.name as 'Category',
     c.value,
     cr.category_id
FROM 
  resource r
  INNER JOIN category_resource cr
    ON r.id = cr.resource_id
  INNER JOIN category c
    ON cr.category_id = c.id
  INNER JOIN user u
    ON r.created_by = u.id
WHERE 
  r.id IN 
    (
      SELECT
        resource_id
      FROM
        category_resource
      WHERE
        category_id IN (1,4)
      GROUP BY
        resource_id
      HAVING
        COUNT(DISTINCT category_id) = 2
    )

Demo at http://sqlfiddle.com/#!3/d9486/10/0