PostgreSQL -must appear in the GROUP BY clause or be used in an aggregate function

Hrishikesh Sardar picture Hrishikesh Sardar · Aug 5, 2013 · Viewed 57.6k times · Source

I am getting this error in the pg production mode, but its working fine in sqlite3 development mode.

 ActiveRecord::StatementInvalid in ManagementController#index

PG::Error: ERROR:  column "estates.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "estates".* FROM "estates"  WHERE "estates"."Mgmt" = ...
               ^
: SELECT "estates".* FROM "estates"  WHERE "estates"."Mgmt" = 'Mazzey' GROUP BY user_id

@myestate = Estate.where(:Mgmt => current_user.Company).group(:user_id).all

Answer

Craig Ringer picture Craig Ringer · Aug 5, 2013

If user_id is the PRIMARY KEY then you need to upgrade PostgreSQL; newer versions will correctly handle grouping by the primary key.

If user_id is neither unique nor the primary key for the 'estates' relation in question, then this query doesn't make much sense, since PostgreSQL has no way to know which value to return for each column of estates where multiple rows share the same user_id. You must use an aggregate function that expresses what you want, like min, max, avg, string_agg, array_agg, etc or add the column(s) of interest to the GROUP BY.

Alternately you can rephrase the query to use DISTINCT ON and an ORDER BY if you really do want to pick a somewhat arbitrary row, though I really doubt it's possible to express that via ActiveRecord.

Some databases - including SQLite and MySQL - will just pick an arbitrary row. This is considered incorrect and unsafe by the PostgreSQL team, so PostgreSQL follows the SQL standard and considers such queries to be errors.

If you have:

col1    col2
fred    42
bob     9
fred    44
fred    99

and you do:

SELECT col1, col2 FROM mytable GROUP BY col1;

then it's obvious that you should get the row:

bob     9

but what about the result for fred? There is no single correct answer to pick, so the database will refuse to execute such unsafe queries. If you wanted the greatest col2 for any col1 you'd use the max aggregate:

SELECT col1, max(col2) AS max_col2 FROM mytable GROUP BY col1;