How to make JOIN query use index?

Silver Light picture Silver Light · May 5, 2013 · Viewed 45.1k times · Source

I have two tables:

CREATE TABLE `articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(1000) DEFAULT NULL,
  `last_updated` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `last_updated` (`last_updated`),
) ENGINE=InnoDB AUTO_INCREMENT=799681 DEFAULT CHARSET=utf8 

CREATE TABLE `article_categories` (
  `article_id` int(11) NOT NULL DEFAULT '0',
  `category_id` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`article_id`,`category_id`),
  KEY `category_id` (`category_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

This is my query:

SELECT a.*
FROM
    articles AS a,
    article_categories AS c
WHERE
    a.id = c.article_id
    AND c.category_id = 78
    AND a.comment_cnt > 0
    AND a.deleted = 0
ORDER BY a.last_updated
LIMIT 100, 20

And an EXPLAIN for it:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: index
possible_keys: PRIMARY
          key: last_updated
      key_len: 9
          ref: NULL
         rows: 2040
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: eq_ref
possible_keys: PRIMARY,fandom_id
          key: PRIMARY
      key_len: 8
          ref: db.a.id,const
         rows: 1
        Extra: Using index

It uses a full index scan of last_updated on the first table for sorting, but does not use an y index for join (type: index in explain). This is very bad for performance and kills the whole database server, since this is a very frequent query.

I've tried reversing table order with STRAIGHT_JOIN, but this gives filesort, using_temporary, which is even worse.

Is there any way to make mysql use index for join and for sorting at the same time?

=== update ===

I'm really desparate in this. Maybe some kind of denormalization can help here?

Answer

Quassnoi picture Quassnoi · May 7, 2013

If you have lots of categories, this query cannot be made efficient. No single index can cover two tables at once in MySQL.

You have to do denormalization: add last_updated, has_comments and deleted into article_categories:

CREATE TABLE `article_categories` (
  `article_id` int(11) NOT NULL DEFAULT '0',
  `category_id` int(11) NOT NULL DEFAULT '0',
  `last_updated` timestamp NOT NULL,
  `has_comments` boolean NOT NULL,
  `deleted` boolean NOT NULL,
  PRIMARY KEY (`article_id`,`category_id`),
  KEY `category_id` (`category_id`),
  KEY `ix_articlecategories_category_comments_deleted_updated` (category_id, has_comments, deleted, last_updated)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

and run this query:

SELECT  *
FROM    (
        SELECT  article_id
        FROM    article_categories
        WHERE   (category_id, has_comments, deleted) = (78, 1, 0)
        ORDER BY
                last_updated DESC
        LIMIT   100, 20
        ) q
JOIN    articles a
ON      a.id = q.article_id

Of course you should update article_categories as well whenever you update relevant columns in article. This can be done in a trigger.

Note that the column has_comments is boolean: this will allow using an equality predicate to make a single range scan over the index.

Also note that the LIMIT goes into the subquery. This makes MySQL use late row lookups which it does not use by default. See this article in my blog about why do they increase performance:

If you were on SQL Server, you could make an indexable view over your query, which essentially would make a denormalized indexed copy of article_categories with the additional fields, automatically mainained by the server.

Unfortunately, MySQL does not support this and you will have to create such a table manually and write additional code to keep it in sync with the base tables.