I am new to SQL and was trying to do a crosstab in Postgres. I would have done it in Excel, but I have a database of around 3.5 million rows, 20,000 different values for code, 7 categories in cat, and variable values from 1 to 100. A code may only have few of the 7 categories.
Excel can't handle the number of rows, so SQL it is.
My data is in the form
code | cat | value |
--------------------------------
abc123 | 1 | 4 |
abc234 | 2 | 6 |
abc345 | 1 | 1 |
abc123 | 3 | 2 |
abc123 | 6 | 12 |
with code and cat as text, value as integer stored in a Postgres table.
I would like to perform a crosstab on code and cat, with sum of value. I would like it to show zero instead of 'null' in the return, but if 'null' would be simpler query, then that would be fine.
So the output I would like is
code | 'cat=0' | 'cat=1' | 'cat=2' | 'cat=3' | 'cat=4' | 'cat=5' | 'cat=6'|
abc123 | 25 | 0 | 3 | 500 | 250 | 42 | 0 |
abc234 | 0 | 100 | 0 | 10 | 5 | 0 | 25 |
abc345 | 1000 | 0 | 0 | 0 | 0 | 0 | 0 |
I have searched on Postgres help files and other forums; the closest thing was the SO question PostgreSQL Crosstab Query but I couldn't figure out how to sum the values from third column.
Any assistance would be greatly appreciated.
I got this working by updating my code to the following:
select * from crosstab(
'select code, cat, sum(value) as value
from my_table
group by code, cat
order by 1,2'
) as ct(code varchar(255),
cat_0 bigint,
cat_1 bigint,
cat_2 bigint,
cat_3 bigint,
cat_4 bigint,
cat_5 bigint,
cat_6 bigint)
I was able to determine the right data type by running the select statement inside the crosstab and matching my as ct data types to those returned by the query inside the crosstab.