I have two tables:
CREATE TABLE `articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(1000) DEFAULT NULL,
`last_updated` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `last_updated` (`last_updated`),
) ENGINE=InnoDB AUTO_INCREMENT=799681 DEFAULT CHARSET=utf8
CREATE TABLE `article_categories` (
`article_id` int(11) NOT NULL DEFAULT '0',
`category_id` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`article_id`,`category_id`),
KEY `category_id` (`category_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
This is my query:
SELECT a.*
FROM
articles AS a,
article_categories AS c
WHERE
a.id = c.article_id
AND c.category_id = 78
AND a.comment_cnt > 0
AND a.deleted = 0
ORDER BY a.last_updated
LIMIT 100, 20
And an EXPLAIN
for it:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: a
type: index
possible_keys: PRIMARY
key: last_updated
key_len: 9
ref: NULL
rows: 2040
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: c
type: eq_ref
possible_keys: PRIMARY,fandom_id
key: PRIMARY
key_len: 8
ref: db.a.id,const
rows: 1
Extra: Using index
It uses a full index scan of last_updated
on the first table for sorting, but does not use an y index for join (type: index
in explain). This is very bad for performance and kills the whole database server, since this is a very frequent query.
I've tried reversing table order with STRAIGHT_JOIN
, but this gives filesort, using_temporary
, which is even worse.
Is there any way to make mysql use index for join and for sorting at the same time?
=== update ===
I'm really desparate in this. Maybe some kind of denormalization can help here?
If you have lots of categories, this query cannot be made efficient. No single index can cover two tables at once in MySQL
.
You have to do denormalization: add last_updated
, has_comments
and deleted
into article_categories
:
CREATE TABLE `article_categories` (
`article_id` int(11) NOT NULL DEFAULT '0',
`category_id` int(11) NOT NULL DEFAULT '0',
`last_updated` timestamp NOT NULL,
`has_comments` boolean NOT NULL,
`deleted` boolean NOT NULL,
PRIMARY KEY (`article_id`,`category_id`),
KEY `category_id` (`category_id`),
KEY `ix_articlecategories_category_comments_deleted_updated` (category_id, has_comments, deleted, last_updated)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
and run this query:
SELECT *
FROM (
SELECT article_id
FROM article_categories
WHERE (category_id, has_comments, deleted) = (78, 1, 0)
ORDER BY
last_updated DESC
LIMIT 100, 20
) q
JOIN articles a
ON a.id = q.article_id
Of course you should update article_categories
as well whenever you update relevant columns in article
. This can be done in a trigger.
Note that the column has_comments
is boolean: this will allow using an equality predicate to make a single range scan over the index.
Also note that the LIMIT
goes into the subquery. This makes MySQL
use late row lookups which it does not use by default. See this article in my blog about why do they increase performance:
If you were on SQL Server, you could make an indexable view over your query, which essentially would make a denormalized indexed copy of article_categories
with the additional fields, automatically mainained by the server.
Unfortunately, MySQL
does not support this and you will have to create such a table manually and write additional code to keep it in sync with the base tables.