Dynamic alternative to pivot with CASE and GROUP BY

flipflop99 picture flipflop99 · Mar 19, 2013 · Viewed 38.6k times · Source

I have a table that looks like this:

id    feh    bar
1     10     A
2     20     A
3      3     B
4      4     B
5      5     C
6      6     D
7      7     D
8      8     D

And I want it to look like this:

bar  val1   val2   val3
A     10     20 
B      3      4 
C      5        
D      6      7     8

I have this query that does this:

SELECT bar, 
   MAX(CASE WHEN abc."row" = 1 THEN feh ELSE NULL END) AS "val1",
   MAX(CASE WHEN abc."row" = 2 THEN feh ELSE NULL END) AS "val2",
   MAX(CASE WHEN abc."row" = 3 THEN feh ELSE NULL END) AS "val3"
FROM
(
  SELECT bar, feh, row_number() OVER (partition by bar) as row
  FROM "Foo"
 ) abc
GROUP BY bar

This is a very make-shifty approach and gets unwieldy if there are a lot of new columns to be created. I was wondering if the CASE statements can be made better to make this query more dynamic? Also, I'd love to see other approaches to doing this.

Answer

Erwin Brandstetter picture Erwin Brandstetter · Mar 20, 2013

If you have not installed the additional module tablefunc, run this command once per database:

CREATE EXTENSION tablefunc;

Answer to question

A very basic crosstab solution for your case:

SELECT * FROM crosstab(
  'SELECT bar, 1 AS cat, feh
   FROM   tbl_org
   ORDER  BY bar, feh')
 AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

The special difficulty here is, that there is no category (cat) in the base table. For the basic 1-parameter form we can just provide a dummy column with a dummy value serving as category. The value is ignored anyway.

This is one of the rare cases where the second parameter for the crosstab() function is not needed, because all NULL values only appear in dangling columns to the right by definition of this problem. And the order can be determined by the value.

If we had an actual category column with names determining the order of values in the result, we'd need the 2-parameter form of crosstab(). Here I synthesize a category column with the help of the window function row_number(), to base crosstab() on:

SELECT * FROM crosstab(
   $$
   SELECT bar, val, feh
   FROM  (
      SELECT *, 'val' || row_number() OVER (PARTITION BY bar ORDER BY feh) AS val
      FROM tbl_org
      ) x
   ORDER BY 1, 2
   $$
 , $$VALUES ('val1'), ('val2'), ('val3')$$         -- more columns?
) AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

The rest is pretty much run-of-the-mill. Find more explanation and links in these closely related answers.

Basics:
Read this first if you are not familiar with the crosstab() function!

Advanced:

Proper test setup

That's how you should provide a test case to begin with:

CREATE TEMP TABLE tbl_org (id int, feh int, bar text);
INSERT INTO tbl_org (id, feh, bar) VALUES
   (1, 10, 'A')
 , (2, 20, 'A')
 , (3,  3, 'B')
 , (4,  4, 'B')
 , (5,  5, 'C')
 , (6,  6, 'D')
 , (7,  7, 'D')
 , (8,  8, 'D');

Dynamic crosstab?

Not very dynamic, yet, as @Clodoaldo commented. Dynamic return types are hard to achieve with plpgsql. But there are ways around it - with some limitations.

So not to further complicate the rest, I demonstrate with a simpler test case:

CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
   ('A', 'val1', 10)
 , ('A', 'val2', 20)
 , ('B', 'val1', 3)
 , ('B', 'val2', 4)
 , ('C', 'val1', 5)
 , ('D', 'val3', 8)
 , ('D', 'val1', 6)
 , ('D', 'val2', 7);

Call:

SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2')
AS ct (row_name text, val1 int, val2 int, val3 int);

Returns:

 row_name | val1 | val2 | val3
----------+------+------+------
 A        | 10   | 20   |
 B        |  3   |  4   |
 C        |  5   |      |
 D        |  6   |  7   |  8

Built-in feature of tablefunc module

The tablefunc module provides a simple infrastructure for generic crosstab() calls without providing a column definition list. A number of functions written in C (typically very fast):

crosstabN()

crosstab1() - crosstab4() are pre-defined. One minor point: they require and return all text. So we need to cast our integer values. But it simplifies the call:

SELECT * FROM crosstab4('SELECT row_name, attrib, val::text  -- cast!
                         FROM tbl ORDER BY 1,2')

Result:

 row_name | category_1 | category_2 | category_3 | category_4
----------+------------+------------+------------+------------
 A        | 10         | 20         |            |
 B        | 3          | 4          |            |
 C        | 5          |            |            |
 D        | 6          | 7          | 8          |

Custom crosstab() function

For more columns or other data types, we create our own composite type and function (once).
Type:

CREATE TYPE tablefunc_crosstab_int_5 AS (
  row_name text, val1 int, val2 int, val3 int, val4 int, val5 int);

Function:

CREATE OR REPLACE FUNCTION crosstab_int_5(text)
  RETURNS SETOF tablefunc_crosstab_int_5
AS '$libdir/tablefunc', 'crosstab' LANGUAGE c STABLE STRICT;

Call:

SELECT * FROM crosstab_int_5('SELECT row_name, attrib, val   -- no cast!
                              FROM tbl ORDER BY 1,2');

Result:

 row_name | val1 | val2 | val3 | val4 | val5
----------+------+------+------+------+------
 A        |   10 |   20 |      |      |
 B        |    3 |    4 |      |      |
 C        |    5 |      |      |      |
 D        |    6 |    7 |    8 |      |

One polymorphic, dynamic function for all

This goes beyond what's covered by the tablefunc module.
To make the return type dynamic I use a polymorphic type with a technique detailed in this related answer: