I have the following SQL query that selects any row matching ANY of the values in the list (9, 10):
SELECT
r.id, r.title,
u.name as 'Created By',
c.name as 'Category',
c.value,
cr.category_id
FROM
category_resource cr
INNER JOIN resource r
ON cr.resource_id = r.id
INNER JOIN user u
ON r.created_by = u.id
INNER JOIN category c
ON cr.category_id = c.id
WHERE cr.category_id IN ('10', '9');
I have tried finding out how to do the opposite, which I also need, i.e select rows that match ALL values.
I have read about using a statement sort of like this:
SELECT
r.id, r.title
FROM
resource r
WHERE
id IN (
SELECT
resource_id
FROM
category_resource
WHERE
category_id IN (9, 10)
GROUP BY
resource_id
HAVING
COUNT(DISTINCT category_id) = 2
);
This is my attempt at adapting this answer to my needs: SQL Server - select rows that match all items in a list
But that doesn't get me as much information back in the results as the first statement does. So how could I do something that is more equivalent? I've tried to put it together, but I'm too new to SQL to figure it out right, I just get errors...
LONG UPDATE:
Gordon Linoff pointed out that this is a strange request. And I know, I feel that it’s strange too, to have a query that returns multiple rows for the same resource. But I don’t know how to do this any other way, maybe I’m on the wrong track altogether, because it just hit me that the original request (the one getting all rows for resources with categories matching ANY in the list) does quite not fulfill my requirement either…
Here’s my requirement as a whole:
First of all, I think the model of this part of the db might be helpful.
(BTW category has a relation with itself also because it is stored as a hierarchy, using the adjacency model, where each category stores its parent id, if anyone was wondering about that arrow...)
1: Find all resources where a category matches ANY of the values in the list. But (this is where that was insufficient) for each of these resources, I need to know the resource along with all of the categories it has…
Let me explain the point of this with a simple example:
This is a many to many relationship as you can see. A resource (e.g let’s say with the title ”Introduction to carpentry in 18th century New England”) can be associated with many categories (e.g category.name = ”subject” value = ”Carpentry”, category.name=”subject” value = ”Wood”, category.name=”subject” value = ”New England”, category.name=”subject” value = ”History”). Note that this example is simplified, but you see the basic idea.
Now, if a user searches for a resource matching ANY of the categories ”Carpentry” and ”Painting”, the resource ”Introduction to carpentry in 18th century New England” should show up in the results, since one of its categories matched. But, here’s the rub, and why Gordon felt my request was strange: in the search result I want to present to the user, I want to list the title ”Introduction to carpentry in 18th century New England” along with a column showing all the categories that title is classified with, even though the user didn’t search for them - in order to give a better overview of the complete subject matter of this resource.
So how can I do that? The only way I could think of was the first statement in my question, but as I said it just hit me that it doesnt’ give me all categories a resource might have, only the ones actually searched for…
Of course I could do a query for the results first, only getting one row for each. And then do a second query, finding all categories for each resource in the results. But if the first query gives me 1000 results (which will be common), then to get the categories for all of these I would have to do 1000 queries to get the categories for each… Sounds like that would give me performance issues…
Am I thinking about this the wrong way? Is there another way to accomplish what I want to do? I.e give me the resources the query selects, along with all of that resource’s associated categories...
2: Well, after that long explanation, the second requirement is simpler to explain: again the same thing about getting all categories back for a resource selected, but this time the selection in the query should only get those resources that match ALL of the provided values. However, just because I provide all of the values in the query again doesn’ mean that I already have all the categories, since each resource in the results may actually have more (and other) categories and I need those too when presenting the results as mentioned in the first (ANY) requirement.
Update 2 speed issues
A speed improvement (to avoid executing the subquery for each row) is to create a temporary table with the resource id that match the subquery and use that in the main query by joining against it.
/*Create a temporary table with the ids we want (the subquery)*/
CREATE TEMPORARY TABLE Matching_Resources (INDEX(resource_id))
AS (
SELECT
resource_id
FROM
category_resource
WHERE
category_id IN (4,1)
GROUP BY
resource_id
HAVING
COUNT(DISTINCT category_id) = 2
);
SELECT
r.id, r.title,
u.name AS 'Created By',
GROUP_CONCAT( CONCAT('[',c.name,',',c.value,',',CAST(c.id as CHAR),']') separator ' // ') AS 'Categories'
FROM
resource r
INNER JOIN Matching_Resources mr
ON r.id = mr.resource_id
INNER JOIN category_resource cr
ON r.id = cr.resource_id
INNER JOIN category c
ON cr.category_id = c.id
INNER JOIN user u
ON r.created_by = u.id
GROUP BY r.id
Update 1 some comments
In both cases you want the category filtering to act just as a filter for the matching resource ids only. So you need to make it a subquery in order to avoid affecting the main query which needs to only restrict resources but return all matching categories.
So WHERE r.id IN (..)
part must exist in both solutions. You already know how to do the filtering in there (as i am only using the same code you provided)
For the requirement of matching ANY provided category
SELECT
r.id, r.title,
u.name as 'Created By',
c.name as 'Category',
c.value,
cr.category_id
FROM
resource r
INNER JOIN category_resource cr
ON r.id = cr.resource_id
INNER JOIN category c
ON cr.category_id = c.id
INNER JOIN user u
ON r.created_by = u.id
WHERE
r.id IN
(
SELECT
resource_id
FROM
category_resource
WHERE
category_id IN (6,1)
)
Demo at http://sqlfiddle.com/#!3/d9486/8/0
For the requirement of matching ALL provided categories
SELECT
r.id, r.title,
u.name as 'Created By',
c.name as 'Category',
c.value,
cr.category_id
FROM
resource r
INNER JOIN category_resource cr
ON r.id = cr.resource_id
INNER JOIN category c
ON cr.category_id = c.id
INNER JOIN user u
ON r.created_by = u.id
WHERE
r.id IN
(
SELECT
resource_id
FROM
category_resource
WHERE
category_id IN (1,4)
GROUP BY
resource_id
HAVING
COUNT(DISTINCT category_id) = 2
)