Why does MySQL Innodb "Creating sort index" when unique index exists?

ck_ picture ck_ · Jul 23, 2013 · Viewed 31.6k times · Source

On a simple but very large Innodb table, I have a unique index on column A and I want to get a list of (integer) column B in order of (integer) column A

Very simple query, I am paging through millions of records.

SELECT B FROM hugeTable ORDER BY A LIMIT 10000 OFFSET 500000

This takes 10 seconds per query on a very fast server?

Filesort: Yes Filesort_on_disk: Yes Merge_passes: 9

This makes no sense to me, why can it not use Index A ?

Explain shows simple, no possible keys and filesort.

Answer

spencer7593 picture spencer7593 · Jul 23, 2013

If the values for column B are not available in the index pages, then MySQL will need to access pages in the underlying table. Also there no predicate that filters which rows are being considered, and that means MySQL is seeing that ALL rows need to be returned. This could explain why the index is not being used.

Also note that the LIMIT operations are processed at the end of the statement, as nearly the last step in the execution plan, with a some exceptions.

8.2.1.3. Optimizing LIMIT Queries http://dev.mysql.com/doc/refman/5.5/en/limit-optimization.html

I suspect that your query could make use of a covering index, for example "ON hugetable (A,B)", to avoid the sort operation.

Absent a covering index, you could try rewriting the query something like this, to see if this will make use of the index on column A, and avoid a sort operation on millions of rows (to get the first 510,000 rows returned in order):

SELECT i.B
  FROM ( SELECT j.A
           FROM hugeTable j
          ORDER
             BY j.A
          LIMIT 10000 OFFSET 500000
       ) k
  JOIN hugetable i
    ON i.A = k.A
 ORDER
    BY k.A

I suggest you do an EXPLAIN on just the inline view query (aliased as k), and see if it shows "Using index."

The outer query is likely to still have a "Using filesort" operation, but at least that will be on only 10,000 rows.

(NOTE: You may want to try an "ORDER BY i.A" in place of "k.A" on the outer query, and see if that makes a difference.)


ADDENDUM

Not specifically addressing your question, but in terms of performance of that query, if this is "paging through" a set of rows, another option to consider, to get to the "next" page is to use the value of "A" from the last row retrieved on the previous query as a "starting point" for the next row.

The original query looks like it's getting "page 51" (10,000 rows per page, page 51 would be rows 510,001 thru 520,000).

If you were to also return the value of 'A', and keep that for the last row. To get the "next" page, the query could actually be:

 SELECT i.B, k.A
   FROM ( SELECT j.A 
            FROM hugeTable j 
           WHERE j.A >  $value_of_A_from_row_520000 
        -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ORDER BY j.A ASC
           LIMIT 10000
        ) k
   JOIN hugetable i
     ON i.A = k.A
  ORDER
     BY k.A

If you also kept the value for A from the "first" row, you could use that for backing up a page. That would really only work for forward one page, or back one page. Jumping to a different page, would have to use the original form of the query, counting rows.