How can I efficiently search json data in a mysql database?
I installed the extract_json udf from labs.mysql.com and played around with a test table with 2.750.000 entries.
CREATE TABLE `testdb`.`JSON_TEST_TABLE` (
`AUTO_ID` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`OP_ID` INT NULL,
`JSON` LONGTEXT NULL,
PRIMARY KEY (`AUTO_ID`)) $$
An example JSON field would look like so:
{"ts": "2014-10-30 15:08:56 (9400.223725848107) ", "operation": "1846922"}
I found that putting json_extract into a select statement has virtually no performance impact. I.e. the following selects (almost) have the same performance:
SELECT * FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10;
SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10;
However, as soon as I put a json_extract expression into the where clause the execution time increases by a factor of 10 or more (I went from 2,5 to 30 secs):
SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where json_extract(JSON, "operation")=2000000 LIMIT 10;
At this point I am thinking that I need to extract all info that I want to search into separate columns at insert time, and that if I really have to search in the json data I need to first narrow down the number of rows to be searched by other criteria, but I would like to make sure that I am not missing anything obvious. E.g. can I somehow index the json fields? Or is my select statement inefficiently written?
In fact during the execution of
SELECT * FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10;
json_extract() will be executed at most 10 times.
During this one
SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where json_extract(JSON, "operation")=2000000 LIMIT 10;
json_extract() will be executed for each row and the result limited to 10 records, hence the speed loss. Indexing won't help either since the processing time is used up rather tby the external code than MySQL's. Imho, the best bet in this case would be an optimized UDF.