SQL - select distinct only on one column

Jason Lipo picture Jason Lipo · Dec 5, 2013 · Viewed 131.2k times · Source

I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 2      | 3968    | Spain       | Spanish     |
| 3      | 3968    | USA         | English     |
| 4      | 1234    | Greece      | Greek       |
| 5      | 1234    | Italy       | Italian     |

I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 4      | 1234    | Greece      | Greek       |

How is this achievable?

Answer

Gordon Linoff picture Gordon Linoff · Dec 5, 2013

A very typical approach to this type of problem is to use row_number():

select t.*
from (select t.*,
             row_number() over (partition by number order by id) as seqnum
      from t
     ) t
where seqnum = 1;

This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.