I know you've got multiple topics touching on this. But, I havent found one that addressed my needs. I need to (on demand) pivot select deep table data to a wide output table. The gotcha in this is that I cannot use an aggregate with Pivot because it eats responses that are needed in the output. I have worked up to a solution, but I don't think it's the best because it will require umpteen left joins to work. I've included all attempts and notes as follows:
-- Sql Server 2008 db. -- Deep table structure (not subject to modification) contains name/value pairs with a userId as -- foreign key. In many cases there can be MORE THAN ONE itemValue given by the user for the -- itemName such as if asked their race, can answer White + Hispanic, etc. Each response is stored -- as a seperate record - this cannot currently be changed. -- Goal: pivot deep data to wide while also compressing result -- set down. Account for all items per userId, and duplicating -- column values (rather than show nulls) as applicable -- Sample table to store some data of both single and multiple responses DECLARE @testTable AS TABLE(userId int, itemName varchar(50), itemValue varchar(255)) INSERT INTO @testTable SELECT 1, 'q01', '1-q01 Answer' UNION SELECT 1, 'q02', '1-q02 Answer' UNION SELECT 1, 'q03', '1-q03 Answer 1' UNION SELECT 1, 'q03', '1-q03 Answer 2' UNION SELECT 1, 'q03', '1-q03 Answer 3' UNION SELECT 1, 'q04', '1-q04 Answer' UNION SELECT 1, 'q05', '1-q05 Answer' UNION SELECT 2, 'q01', '2-q01 Answer' UNION SELECT 2, 'q02', '2-q02 Answer' UNION SELECT 2, 'q03', '2-q03 Answer 1' UNION SELECT 2, 'q03', '2-q03 Answer 2' UNION SELECT 2, 'q04', '2-q04 Answer' UNION SELECT 2, 'q05', '2-q05 Answer' SELECT 'Raw Data' SELECT * FROM @TestTable SELECT 'Using Pivot - shows aggregate result of itemValue per itemName - eats others' ; WITH Data AS ( SELECT [userId] , [itemName] , [itemValue] FROM @testTable ) SELECT [userId] , [q02] , [q03] , [q05] FROM Data PIVOT ( MIN(itemValue) -- Aggregate function eats needed values. FOR itemName in ([q02], [q03], [q05]) ) AS PivotTable SELECT 'Aggregate with Grouping - Causes Null Values' SELECT DISTINCT userId ,[q02] = Max(CASE WHEN itemName = 'q02' THEN itemValue END) ,[q03] = Max(CASE WHEN itemName = 'q03' THEN itemValue END) ,[q05] = Max(CASE WHEN itemName = 'q05' THEN itemValue END) FROM @testTable WHERE itemName in ('q02', 'q03', 'q05') -- Makes it a hair quicker GROUP BY userId -- If by userId only, it only gives 1 row PERIOD = BAD!! , [itemName] , [itemValue] SELECT 'Multiple Left Joins - works properly but bad if pivoting 175 columns or so' ; WITH Data AS ( SELECT userId ,[itemName] ,[itemValue] FROM @testTable WHERE itemName in ('q02', 'q03', 'q05') -- Makes it a hair quicker ) SELECT DISTINCT s1.userId ,[q02] = s2.[itemValue] ,[q03] = s3.[itemValue] ,[q05] = s5.[itemValue] FROM Data s1 LEFT JOIN Data s2 ON s2.userId = s1.userId AND s2.[itemName] = 'q02' LEFT JOIN Data s3 ON s3.userId = s1.userId AND s3.[itemName] = 'q03' LEFT JOIN Data s5 ON s5.userId = s1.userId AND s5.[itemName] = 'q05'
So the bottom query is the only one (so far) that does what I need it to do, but the LEFT JOIN's WILL get out of hand and cause performance issues when I use actual item names to pivot. Any recommendations are appreciated.
I think you'll have to stick with joins, because joins are exactly the way of producing results like the one you are after. The purpose of a join is to combine row sets together (on a condition or without any), and your target output is nothing else than a combination of subsets of rows.
However, if the majority of questions always have single responses, you could substantially reduce the number of necessary joins. The idea is to join only multiple-response groups as separate row sets. As for the single-response items, they are joined only as part of the entire dataset of target items.
An example should better illustrate what I might poorly describe verbally. Assuming there are two potentially multiple-response groups in the source data, 'q03'
and 'q06'
(actually, here's the source table:
DECLARE @testTable AS TABLE(
userId int,
itemName varchar(50),
itemValue varchar(255)
);
INSERT INTO @testTable
SELECT 1, 'q01', '1-q01 Answer'
UNION SELECT 1, 'q02', '1-q02 Answer'
UNION SELECT 1, 'q03', '1-q03 Answer 1'
UNION SELECT 1, 'q03', '1-q03 Answer 2'
UNION SELECT 1, 'q03', '1-q03 Answer 3'
UNION SELECT 1, 'q04', '1-q04 Answer'
UNION SELECT 1, 'q05', '1-q05 Answer'
UNION SELECT 1, 'q06', '1-q06 Answer 1'
UNION SELECT 1, 'q06', '1-q06 Answer 2'
UNION SELECT 1, 'q06', '1-q06 Answer 3'
UNION SELECT 2, 'q01', '2-q01 Answer'
UNION SELECT 2, 'q02', '2-q02 Answer'
UNION SELECT 2, 'q03', '2-q03 Answer 1'
UNION SELECT 2, 'q03', '2-q03 Answer 2'
UNION SELECT 2, 'q04', '2-q04 Answer'
UNION SELECT 2, 'q05', '2-q05 Answer'
UNION SELECT 2, 'q06', '2-q06 Answer 1'
UNION SELECT 2, 'q06', '2-q06 Answer 2'
;
which is same as the table in the original post, but with added 'q06'
items), the resulting script could be like this:
WITH ranked AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY userId, itemName ORDER BY itemValue)
FROM @testTable
),
multiplied AS (
SELECT
r.userId,
r.itemName,
r.itemValue,
rn03 = r03.rn,
rn06 = r06.rn
FROM ranked r03
INNER JOIN ranked r06 ON r03.userId = r06.userId AND r06.itemName = 'q06'
INNER JOIN ranked r ON r03.userId = r.userId AND (
r.itemName = 'q03' AND r.rn = r03.rn OR
r.itemName = 'q06' AND r.rn = r06.rn OR
r.itemName NOT IN ('q03', 'q06')
)
WHERE r03.itemName = 'q03'
AND r.itemName IN ('q02', 'q03', 'q05', 'q06')
)
SELECT userId, rn03, rn06, q02, q03, q05, q06
FROM multiplied
PIVOT (
MIN(itemValue)
FOR itemName in (q02, q03, q05, q06)
) AS PivotTable