PostgreSQL: Select only the first record per id based on sort order

Sarah Bergquist picture Sarah Bergquist · Sep 24, 2013 · Viewed 16.8k times · Source

For the following query I need to select only the first record with the lowest shape_type value (ranges from 1 to 10). If you have any knowledge on how to easily do this is postgresql, please help. Thanks for your time.

select g.geo_id, gs.shape_type
from schema.geo g   
join schema.geo_shape gs on (g.geo_id=gs.geo_id)  
order by gs.shape_type asc;

Answer

Roman Pekar picture Roman Pekar · Sep 24, 2013

PostgreSQL have very nice syntax for this types of queries - distinct on:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

So your query becomes:

select distinct on(g.geo_id)
    g.geo_id, gs.shape_type
from schema.geo g   
    join schema.geo_shape gs on (g.geo_id=gs.geo_id)  
order by g.geo_id, gs.shape_type asc;

In general ANSI-SQL syntax for this (in any RDBMS with window functions and common table expression, which could be switched to subquery) would be:

with cte as (
    select
        row_number() over(partition by g.geo_id order by gs.shape_type) as rn,
        g.geo_id, gs.shape_type
    from schema.geo g   
        join schema.geo_shape gs on (g.geo_id=gs.geo_id)  
)
select
    geo_id, shape_type
from cte
where rn = 1