Calculating correlation coefficient using PostgreSQL?

Dave picture Dave · Jun 10, 2010 · Viewed 6.9k times · Source

I have worked out how to calculate the correlation coefficient between two fields if both are in the same table:

SELECT corr(column1, column2) FROM table WHERE <my filters>;

...but I can't work out how to do it when the columns are from different tables (I need to apply the same filters to both tables).

Any hints, please?

Answer

Daniel DiPaolo picture Daniel DiPaolo · Jun 10, 2010

If the tables are related to one another such that you can join them, it's fairly simple. Just join them and do the correlation:

SELECT corr(t1.col1, t2.col2)
FROM table1 t1
     JOIN table2 t2
         ON t1.join_field = t2.join_field
WHERE
     <filters for t1>
     AND
     <filters for t2>

If they're not, then how are you supposed to find out which combination of fields from each table you want to run corr on?