PG::Error: SELECT DISTINCT, ORDER BY expressions must appear in select list

Andy picture Andy · Oct 2, 2012 · Viewed 70.6k times · Source

ActionView::Template::Error (PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

I'm creating an events website and I'm trying to sort the rendered rsvps by the start time of the event. There are a lot of RSVPS so I'm grouping them with distinct, but I've been having a lot of difficulty over the last few days on sorting the results without this error popping up on PG. I've looked at some of the previous questions on the topic and am still pretty lost. How can I get this to work? Thank you so much!

@rsvps = Rsvp.where(:voter_id => current_user.following.collect {|f| f["id"]}, :status => 'going').where("start_time > ? AND start_time < ?", Time.now, Time.now + 1.month).order("count_all desc").count(:group => :event_id).collect { |f| f[0] }

<%= render :partial => 'rsvps/rsvp', :collection => Rsvp.where(:event_id => @rsvps).select("DISTINCT(event_id)").order('start_time asc') %>

Answer

AdrianoKF picture AdrianoKF · Aug 21, 2013

I know this is a rather old question, but I just went through a small example in my head which helped me understand why Postgres has this seemingly odd restriction on SELECT DISTINCT / ORDER BY columns.

Imagine you have the following data in your Rsvp table:

 event_id |        start_time
----------+------------------------
    0     | Mar 17, 2013  12:00:00
    1     |  Jan 1, 1970  00:00:00
    1     | Aug 21, 2013  16:30:00
    2     |  Jun 9, 2012  08:45:00

Now you want to grab a list of distinct event_ids, ordered by their respective start_times. But where should 1 go? Should it come first, because the one tuple starts on Jan 1, 1970, or should it go last because of the Aug 21, 2013?

As the database system can't make that decision for you and the syntax of the query can't depend on the actual data it might be operating on (assuming event_id is unique), we are restricted to ordering only by columns from the SELECT clause.

As for the actual question - an alternative to Matthew's answer is using an aggregate function like MIN or MAX for the sorting:

  SELECT event_id
    FROM Rsvp
GROUP BY event_id
ORDER BY MIN(start_time)

The explicit grouping and aggregation on start_time permit the database to come up with a unambiguous ordering of the result tuples. Note however, that readability is definitely an issue in this case ;)