Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

david blaine picture david blaine · Dec 22, 2012 · Viewed 705.6k times · Source

I got an error -

Column 'Employee.EmpID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc 
on emp.LocationID = loc.LocationID
group by loc.LocationID 

This situation fits into the answer given by Bill Karwin.

correction for above, fits into answer by ExactaBox -

select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc 
on emp.LocationID = loc.LocationID
group by loc.LocationID 

ORIGINAL QUESTION -

For the SQL query -

select *
from Employee as emp full join Location as loc 
on emp.LocationID = loc.LocationID
group by (loc.LocationID)

I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.

I think I have a partial explanation for my own question. Tell me if its ok -

To group all employees that work in the same location we have to first mention the LocationID.

Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure. So, this explains the "it is not contained in either an aggregate function" part of the error.

What is the explanation for the GROUP BY clause part of the error ?

Answer

Bill Karwin picture Bill Karwin · Dec 22, 2012

Suppose I have the following table T:

a   b
--------
1   abc
1   def
1   ghi
2   jkl
2   mno
2   pqr

And I do the following query:

SELECT a, b
FROM T
GROUP BY a

The output should have two rows, one row where a=1 and a second row where a=2.

But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.

This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).

Fixing it might look like this:

SELECT a, MAX(b) AS x
FROM T
GROUP BY a

Now it's clear that you want the following result:

a   x
--------
1   ghi
2   pqr