Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.
So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.
Here's the code:
select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
inner join Price v
on v.FundCode = d.FundCode
and v.PriceDate = (
select max(PriceDate)
from Price
where FundCode = v.FundCode
/* */ and PriceDate < d.TransactionDate
)
It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.
The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.
I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?
There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.
Basically, the method uses this pattern:
SELECT t.*
FROM t
LEFT JOIN t AS t2 ON t.key = t2.key
AND t2.Value > t.Value /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL
or its NOT EXISTS counterpart:
SELECT *
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS t2
WHERE t.key = t2.key
AND t2.Value > t.Value /* same as above applies to ">" here as well */
)
So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.
When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.
So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
LEFT JOIN Price v2 ON v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
WHERE v2.FundCode IS NULL
And here's a similar solution with NOT EXISTS:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
WHERE NOT EXISTS (
SELECT *
FROM Price v2
WHERE v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
)