I have two tables, XMLtable
and filterTable
.
I need all the XMLtable.ID
values from XMLtable
where the data in Col_X
contains MyElement
, the contents of which matches filterColumn
in filterTable
.
The XML for each row in Col_X
may contain multiple MyElement
's, and I want that ID
in case ANY of those elements match ANY of the values in filterColumn
.
The problem is that those columns are actually of varchar(max)
datatype, and the table itself is huge (like 50GB huge). So this query needs to be as optimized as possible.
Here's an example for where I am now, which merely returns the row where the first matching element equals one of the ones I'm looking for. Due to a plethora of different error messages I can't seem to be able to change this to compare to all of the same named elements as I want to.
SELECT ID,
CAST(Col_X AS XML).value('(//*[local-name()=''MyElement''])', N'varchar(25)')
FROM XMLtable
...and then compare the results to filterTable
. This already takes 5+ minutes.
What I'm trying to achieve is something like:
SELECT ID
FROM XMLtable
WHERE CAST(Col_X AS XML).query('(//*[local-name()=''MyElement''])')
IN (SELECT filterColumn FROM filterTable)
The only way I can currently achieve this is to use the LIKE operator, which takes like a thousand times longer.
Now, obviously it's not an option to start changing the datatypes of the columns or anything else. This is what I have to work with. :)
Try this:
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
FROM XMLTable
CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
CROSS APPLY X.Col_X.nodes('(//*[local-name()="MyElement"])') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
and this:
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
FROM XMLTable
CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
CROSS APPLY X.Col_X.nodes('//MyElement') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
Update
I think that you are experiencing what is described here Compute Scalars, Expressions and Execution Plan Performance. The cast to XML is deferred to each call to the value
function. The test you should make is to change the datatype of Col_X
to XML
.
If that is not an option you could query the rows you need from XMLTable into a temporary table that has an XML column and then do the query above against the temporary table without the need to cast to XML.
CREATE TABLE #XMLTable
(
ID int,
Col_X xml
)
INSERT INTO #XMLTable(ID, Col_X)
SELECT ID, Col_X
FROM XMLTable
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'varchar(25)') AS 'MyElementValue'
FROM #XMLTable
CROSS APPLY Col_X.nodes('//MyElement') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
DROP TABLE #XMLTable