We have been using User-Defined Table Types to pass a list of integers to our stored procedures.
We then use these to join to other tables in our stored proc queries.
For example:
CREATE PROCEDURE [dbo].[sp_Name]
(
@Ids [dbo].[OurTableType] READONLY
)
AS
SET Nocount ON
SELECT
*
FROM
SOMETABLE
INNER JOIN @Ids [OurTableType] ON [OurTableType].Id = SOMETABLE.Id
We have seen very poor performance from this when using larger datasets.
One approach we've used to speed things up, is the dump the contents into a temp table and join off that instead.
For example:
CREATE PROCEDURE [dbo].[sp_Name]
(
@Ids [dbo].[OurTableType] READONLY
)
AS
SET Nocount ON
CREATE TABLE #TempTable(Id INT)
INSERT INTO #TempTable
SELECT Id from @Ids
SELECT
*
FROM
SOMETABLE
INNER JOIN #TempTable ON #TempTable.Id = SOMETABLE.Id
DROP TABLE #TempTable
This does improve performance significantly, but I wanted to get some opinions on this approach and any other consequences we haven't considered. Also an explanation as to why this improves performance may also be useful.
N.B. sometime we may need to pass in more than just an integer, hence why we don't use a comma separated list or something like that.
SQL Server 2019 and SQL Azure
Microsoft has implemented a new feature called Table Variable Deferred Compilation that largely resolves the performance issues with table variables in previous versions of SQL Server:
With table variable deferred compilation, compilation of a statement that references a table variable is deferred until the first actual execution of the statement. This is identical to the behavior of temporary tables, and this change results in the use of actual cardinality instead of the original one-row guess.
This behaviour is available and enabled out-of-the-box and requires no opt-in. Unfortunately it can still suffer from parameter sniffing issues, but overall it's a massive improvement.
SQL Server 2017 and earlier
The primary reason for the poor performance of the JOIN is that the Table-Valued Parameter (TVP) is a Table Variable. Table Variables do not keep statistics and appear to the Query Optimizer to only have 1 row. Hence they are just fine to do something like INSERT INTO Table (column_list) SELECT column_list FROM @TVP;
but not a JOIN.
There are a few things to try to get around this:
Dump everything to a local temporary table (you are already doing this). A technical downside here is that you are duplicating the data passed into the TVP in tempdb
(where both the TVP and temp table store their data).
Try defining the User-Defined Table Type to have a Clustered Primary Key. You can do this inline on the [Id]
field:
[ID] INT NOT NULL PRIMARY KEY
Not sure how much this helps performance, but worth a try.
Add OPTION (RECOMPILE)
to the query. This is a way of getting the Query Optimizer to see how many rows are in a Table Variable so that it can have proper estimates.
SELECT column_list
FROM SOMETABLE
INNER JOIN @Ids [OurTableType]
ON [OurTableType].Id = SOMETABLE.Id
OPTION (RECOMPILE);
The downside here is that you have a RECOMPILE
which takes additional time each time this proc is called. But that might be an overall net gain.
Starting in SQL Server 2014, you can take advantage of In-Memory OLTP and specify WITH (MEMORY_OPTIMIZED = ON)
for the User-Defined Table Type. Please see Scenario: Table variable can be MEMORY_OPTIMIZED=ON for details. I have heard that this definitely helps. Unfortunately, in SQL Server 2014 and SQL Server 2016 RTM this feature is only available in 64-bit Enterprise Edition. But, starting with SQL Server 2016 SP1, this feature was made available to all editions (possible exception being SQL Server Express LocalDB).
PS. Don't do SELECT *
. Always specify a column list. Unless doing something like an IF EXIST(SELECT * FROM)...
.