Oracle: Full text search with condition

Clinton picture Clinton · Sep 9, 2011 · Viewed 11.4k times · Source

I've created an Oracle Text index like the following:

create index my_idx on my_table (text) indextype is ctxsys.context; 

And I can then do the following:

select * from my_table where contains(text, '%blah%') > 0;

But lets say we have a have another column in this table, say group_id, and I wanted to do the following query instead:

select * from my_table where contains(text, '%blah%') > 0 and group_id = 43;

With the above index, Oracle will have to search for all items that contain 'blah', and then check all of their group_ids.

Ideally, I'd prefer to only search the items with group_id = 43, so I'd want an index like this:

create index my_idx on my_table (group_id, text) indextype is ctxsys.context; 

Kind of like a normal index, so a separate text search can be done for each group_id.

Is there a way to do something like this in Oracle (I'm using 10g if that is important)?

Edit (clarification)

Consider a table with one million rows and the following two columns among others, A and B, both numeric. Lets say there are 500 different values of A and 2000 different values of B, and each row is unique.

Now lets consider select ... where A = x and B = y

An index on A and B separately as far as I can tell do an index search on B, which will return 500 different rows, and then do a join/scan on these rows. In any case, at least 500 rows have to be looked at (aside from the database being lucky and finding the required row early.

Whereas an index on (A,B) is much more effective, it finds the one row in one index search.

Putting separate indexes on group_id and the text I feel only leaves the query generator with two options.

(1) Use the group_id index, and scan all the resulting rows for the text.
(2) Use the text index, and scan all the resulting rows for the group_id.
(3) Use both indexes, and do a join.

Whereas I want:

(4) Use the (group_id, "text") index to find the text index under the particular group_id and scan that text index for the particular row/rows I need. No scanning and checking or joining required, much like when using an index on (A,B).

Answer

Jon Heller picture Jon Heller · Sep 16, 2011

Oracle Text

1 - You can improve performance by creating the CONTEXT index with FILTER BY:

create index my_idx on my_table(text) indextype is ctxsys.context filter by group_id;

In my tests the filter by definitely improved the performance, but it was still slightly faster to just use a btree index on group_id.

2 - CTXCAT indexes use "sub-indexes", and seem to work similar to a multi-column index. This seems to be the option (4) you're looking for:

begin
  ctx_ddl.create_index_set('my_table_index_set');
  ctx_ddl.add_index('my_table_index_set', 'group_id');
end;
/

create index my_idx2 on my_table(text) indextype is ctxsys.ctxcat
    parameters('index set my_table_index_set');

select * from my_table where catsearch(text, 'blah', 'group_id = 43') > 0

This is likely the fastest approach. Using the above query against 120MB of random text similar to your A and B scenario required only 18 consistent gets. But on the downside, creating the CTXCAT index took almost 11 minutes and used 1.8GB of space.

(Note: Oracle Text seems to work correctly here, but I'm not familiar with Text and I can't gaurentee this isn't an inappropriate use of these indexes like @NullUserException said.)

Multi-column indexes vs. index joins

For the situation you describe in your edit, normally there would not be a significant difference between using an index on (A,B) and joining separate indexes on A and B. I built some tests with data similar to what you described and an index join required only 7 consistent gets versus 2 consistent gets for the multi-column index.

The reason for this is because Oracle retrieves data in blocks. A block is usually 8K, and an index block is already sorted, so you can probably fit the 500 to 2000 values in a few blocks. If you're worried about performance, usually the IO to read and write blocks is the only thing that matters. Whether or not Oracle has to join together a few thousand rows is an inconsequential amount of CPU time.

However, this doesn't apply to Oracle Text indexes. You can join a CONTEXT index with a btree index (a "bitmap and"?), but the performance is poor.