I have two tables: 'movies' and 'users'. There's an n:m relationship between those, describing what movies a user has seen. This is described with a table 'seen' Now i want to find out for a given user, all the movies he has not seen. My current solution is like this:
SELECT *
FROM movies
WHERE movies.id NOT IN (
SELECT seen.movie_id
FROM seen
WHERE seen.user_id=123
)
This works fine but does not seem to scale very well. Is there a better approach to this?
Here's a typical way to do this query without using the subquery method you showed. This may satisfy @Godeke's request to see a join-based solution.
SELECT *
FROM movies m
LEFT OUTER JOIN seen s
ON (m.id = s.movie_id AND s.user_id = 123)
WHERE s.movie_id IS NULL;
However, in most brands of database this solution can perform worse than the subquery solution. It's best to use EXPLAIN to analyze both queries, to see which one will do better given your schema and data.
Here's another variation on the subquery solution:
SELECT *
FROM movies m
WHERE NOT EXISTS (SELECT * FROM seen s
WHERE s.movie_id = m.id
AND s.user_id=123);
This is a correlated subquery, which must be evaluated for every row of the outer query. Usually this is expensive, and your original example query is better. On the other hand, in MySQL "NOT EXISTS
" is often better than "column NOT IN (...)
"
Again, you must test each solution and compare the results to be sure. It's a waste of time to choose any solution without measuring performance.