I'm planning on creating a view using this SQL SELECT, but the explain for it shows it's using temporary and using filesort. I can't figure out what indices I need in order to fix this problem. Mostly, I'm wondering why it's using filesort intead of using an index to sort.
Here are my tables:
CREATE TABLE `learning_signatures` (
`signature_id` int(11) NOT NULL AUTO_INCREMENT,
`signature_file` varchar(100) NOT NULL,
`signature_md5` varchar(32) NOT NULL,
`image_file` varchar(100) NOT NULL,
PRIMARY KEY (`signature_id`),
UNIQUE KEY `unique_signature_md5` (`signature_md5`)
) ENGINE=InnoDB AUTO_INCREMENT=640 DEFAULT CHARSET=latin1
CREATE TABLE `learning_user_suggestions` (
`user_suggestion_id` int(11) NOT NULL AUTO_INCREMENT,
`signature_id` int(11) NOT NULL,
`ch` char(1) NOT NULL,
`time_suggested` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`user_suggestion_id`),
KEY `char_index` (`ch`),
KEY `ls_sig_id_indx` (`signature_id`),
KEY `user_id_indx` (`user_id`),
KEY `sig_char_indx` (`signature_id`,`ch`)
) ENGINE=InnoDB AUTO_INCREMENT=1173 DEFAULT CHARSET=latin1
And here is the problematic SQL statement I'm planning on using in my view:
select ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file, sug.ch , count(sug.ch) AS suggestion_count
from (`learning_signatures` `ls` left join `learning_user_suggestions` `sug` on(ls.signature_id = sug.signature_id))
group by ls.signature_id, sug.ch;
Output from explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ls ALL NULL NULL NULL NULL 514 "Using temporary; Using filesort"
1 SIMPLE sug ref ls_sig_id_indx,sig_char_indx ls_sig_id_indx 4 wwf.ls.signature_id 1
Another example, this time using a where clause:
explain select ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file, sug.ch , count(sug.ch) AS suggestion_count
from (`learning_signatures` `ls` left join `learning_user_suggestions` `sug` on(ls.signature_id = sug.signature_id))
WHERE signature_md5 = '75f8a5b1176ecc2487b90bacad9bc4c'
group by ls.signature_id, sug.ch;
Explain output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ls const unique_signature_md5 unique_signature_md5 34 const 1 "Using temporary; Using filesort"
1 SIMPLE sug ref ls_sig_id_indx,sig_char_indx ls_sig_id_indx 4 const 1
In your first query, what you do is join your signatures table with user suggestions, getting lots of rows, and then group results using some columns from user suggestions. But there is no index for the joined table to help with grouping as it would have to be defined on previously joined table. What you should instead do is try to create a derived table from user suggestions that is already groupped by ch and signature_id and then join it:
SELECT ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file,
sug.ch, sug.suggestion_count
FROM learning_signatures ls
LEFT JOIN
(SELECT s.signature_id, s.ch, count(s.ch) as suggestion_count
FROM learning_user_suggestions s
GROUP BY s.signature_id, s.ch ) as sug
ON ls.signature_id = sug.signature_id
Optimizer should be able now to use your sig_char_indx index for groupping, the derived table will be not bigger than your signatures table and you join both using unique column. You will still have to do a full scan over signatures table, but that cannot be avoided because you are selecting all of it anyway.
As for the second query, if you want to restrict signatures to a single one just append
WHERE ls.signature_md5='75f8a5b1176ecc2487b90bacad9bc4c'
to the end of previous query and group by only s.ch, because only one signature_id will match your md5 anyway. Optimizer should now use md5 index for where and char_index for grouping.