Sql Server 2008 - PIVOT without Aggregation Function

Danny Grogan picture Danny Grogan · Oct 21, 2011 · Viewed 9.1k times · Source

I know you've got multiple topics touching on this. But, I havent found one that addressed my needs. I need to (on demand) pivot select deep table data to a wide output table. The gotcha in this is that I cannot use an aggregate with Pivot because it eats responses that are needed in the output. I have worked up to a solution, but I don't think it's the best because it will require umpteen left joins to work. I've included all attempts and notes as follows:

-- Sql Server 2008 db.
-- Deep table structure (not subject to modification) contains name/value pairs with a userId as
-- foreign key.  In many cases there can be MORE THAN ONE itemValue given by the user for the
-- itemName such as if asked their race, can answer White + Hispanic, etc.  Each response is stored
-- as a seperate record - this cannot currently be changed.

-- Goal: pivot deep data to wide while also compressing result 
-- set down. Account for all items per userId, and duplicating
-- column values (rather than show nulls) as applicable

-- Sample table to store some data of both single and multiple responses
DECLARE @testTable AS TABLE(userId int, itemName varchar(50), itemValue varchar(255))

INSERT INTO @testTable
SELECT 1, 'q01', '1-q01 Answer'
UNION SELECT 1, 'q02', '1-q02 Answer'
UNION SELECT 1, 'q03', '1-q03 Answer 1'
UNION SELECT 1, 'q03', '1-q03 Answer 2'
UNION SELECT 1, 'q03', '1-q03 Answer 3'
UNION SELECT 1, 'q04', '1-q04 Answer'
UNION SELECT 1, 'q05', '1-q05 Answer'
UNION SELECT 2, 'q01', '2-q01 Answer'
UNION SELECT 2, 'q02', '2-q02 Answer'
UNION SELECT 2, 'q03', '2-q03 Answer 1'
UNION SELECT 2, 'q03', '2-q03 Answer 2'
UNION SELECT 2, 'q04', '2-q04 Answer'
UNION SELECT 2, 'q05', '2-q05 Answer'

SELECT 'Raw Data'
SELECT * FROM @TestTable

SELECT 'Using Pivot - shows aggregate result of itemValue per itemName - eats others'
; WITH Data AS (
    SELECT
        [userId]
        , [itemName]
        , [itemValue]
    FROM 
        @testTable
)
SELECT
    [userId]
    , [q02]
    , [q03]
    , [q05]
FROM
    Data
PIVOT
(
    MIN(itemValue)  -- Aggregate function eats needed values.
    FOR itemName in ([q02], [q03], [q05])
) AS PivotTable


SELECT 'Aggregate with Grouping - Causes Null Values'
SELECT
    DISTINCT userId 
    ,[q02] = Max(CASE WHEN itemName = 'q02' THEN itemValue END)
    ,[q03] = Max(CASE WHEN itemName = 'q03' THEN itemValue END)
    ,[q05] = Max(CASE WHEN itemName = 'q05' THEN itemValue END)
FROM
    @testTable
WHERE
    itemName in ('q02', 'q03', 'q05')   -- Makes it a hair quicker
GROUP BY
    userId  -- If by userId only, it only gives 1 row PERIOD = BAD!!
    , [itemName]
    , [itemValue]


SELECT 'Multiple Left Joins - works properly but bad if pivoting 175 columns or so'
; WITH Data AS (
    SELECT
        userId 
        ,[itemName]
        ,[itemValue]
    FROM
        @testTable
    WHERE
        itemName in ('q02', 'q03', 'q05')   -- Makes it a hair quicker
)
SELECT
    DISTINCT s1.userId
    ,[q02] = s2.[itemValue]
    ,[q03] = s3.[itemValue]
    ,[q05] = s5.[itemValue]
FROM
    Data s1
    LEFT JOIN Data s2 
        ON s2.userId = s1.userId 
            AND s2.[itemName] = 'q02'
    LEFT JOIN Data s3 
        ON s3.userId = s1.userId 
            AND s3.[itemName] = 'q03'
    LEFT JOIN Data s5 
        ON s5.userId = s1.userId 
            AND s5.[itemName] = 'q05'

So the bottom query is the only one (so far) that does what I need it to do, but the LEFT JOIN's WILL get out of hand and cause performance issues when I use actual item names to pivot. Any recommendations are appreciated.

Answer

Andriy M picture Andriy M · Oct 25, 2011

I think you'll have to stick with joins, because joins are exactly the way of producing results like the one you are after. The purpose of a join is to combine row sets together (on a condition or without any), and your target output is nothing else than a combination of subsets of rows.

However, if the majority of questions always have single responses, you could substantially reduce the number of necessary joins. The idea is to join only multiple-response groups as separate row sets. As for the single-response items, they are joined only as part of the entire dataset of target items.

An example should better illustrate what I might poorly describe verbally. Assuming there are two potentially multiple-response groups in the source data, 'q03' and 'q06' (actually, here's the source table:

DECLARE @testTable AS TABLE(
  userId int,
  itemName varchar(50),
  itemValue varchar(255)
);

INSERT INTO @testTable
SELECT 1, 'q01', '1-q01 Answer'
UNION SELECT 1, 'q02', '1-q02 Answer'
UNION SELECT 1, 'q03', '1-q03 Answer 1'
UNION SELECT 1, 'q03', '1-q03 Answer 2'
UNION SELECT 1, 'q03', '1-q03 Answer 3'
UNION SELECT 1, 'q04', '1-q04 Answer'
UNION SELECT 1, 'q05', '1-q05 Answer'
UNION SELECT 1, 'q06', '1-q06 Answer 1'
UNION SELECT 1, 'q06', '1-q06 Answer 2'
UNION SELECT 1, 'q06', '1-q06 Answer 3'
UNION SELECT 2, 'q01', '2-q01 Answer'
UNION SELECT 2, 'q02', '2-q02 Answer'
UNION SELECT 2, 'q03', '2-q03 Answer 1'
UNION SELECT 2, 'q03', '2-q03 Answer 2'
UNION SELECT 2, 'q04', '2-q04 Answer'
UNION SELECT 2, 'q05', '2-q05 Answer'
UNION SELECT 2, 'q06', '2-q06 Answer 1'
UNION SELECT 2, 'q06', '2-q06 Answer 2'
;

which is same as the table in the original post, but with added 'q06' items), the resulting script could be like this:

WITH ranked AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY userId, itemName ORDER BY itemValue)
  FROM @testTable
),
multiplied AS (
  SELECT
    r.userId,
    r.itemName,
    r.itemValue,
    rn03 = r03.rn,
    rn06 = r06.rn
  FROM ranked r03
    INNER JOIN ranked r06 ON r03.userId = r06.userId AND r06.itemName = 'q06'
    INNER JOIN ranked r ON r03.userId = r.userId AND (
      r.itemName = 'q03' AND r.rn = r03.rn OR
      r.itemName = 'q06' AND r.rn = r06.rn OR
      r.itemName NOT IN ('q03', 'q06')
    )
  WHERE r03.itemName = 'q03'
    AND r.itemName IN ('q02', 'q03', 'q05', 'q06')
)
SELECT userId, rn03, rn06, q02, q03, q05, q06
FROM multiplied
PIVOT (
  MIN(itemValue)  
  FOR itemName in (q02, q03, q05, q06)
) AS PivotTable