Alphanumeric sorting with PostgreSQL

user1464055 picture user1464055 · Jul 10, 2012 · Viewed 23.1k times · Source

In the database, I have various alpha-numeric strings in the following format:

10_asdaasda
100_inkskabsjd
11_kancaascjas
45_aksndsialcn
22_dsdaskjca
100_skdnascbka

I want them to essentially be sorted by the number in front of the string and then the string name itself, but of course, characters are compared one by one and so the result of Order by name produces:

10_asdaasda
100_inkskabsjd
100_skdnascbka
11_kancaascjas
22_dsdaskjca
45_aksndsialcn

instead of the order I'd prefer:

10_asdaasda
11_kancaascjas
22_dsdaskjca
45_aksndsialcn
100_inkskabsjd
100_skdnascbka

Honestly, I would be fine if the strings were just sorted by the number in front. I'm not too familiar with PostgreSQL, so I wasn't sure what the best way to do this would be. I'd appreciate any help!

Answer

Erwin Brandstetter picture Erwin Brandstetter · Jul 10, 2012

The ideal way would be to normalize your data and split the two components of the column into two individual columns. One of type integer, one text.

With the current table, you can do something like demonstrated here:

WITH x(t) AS (
    VALUES
     ('10_asdaasda')
    ,('100_inkskabsjd')
    ,('11_kancaascjas')
    ,('45_aksndsialcn')
    ,('22_dsdaskjca')
    ,('100_skdnascbka')
    )
SELECT t
FROM   x
ORDER  BY (substring(t, '^[0-9]+'))::int     -- cast to integer
          ,substring(t, '[^0-9_].*$')        -- works as text

The same substring() expressions can be used to split the column.

The regular expressions are somewhat fault tolerant:

  • The first regex picks the longest numeric string from the left, NULL if no digits are found, so the cast to integer can't go wrong.

  • The second regex picks the rest of the string from the first character that is not a digit or '_'.

If the underscore is unambiguous as separator anyway, split_part() is faster:

ORDER  BY (split_part(t, '_', 1)::int
          ,split_part(t, '_', 2)

Answer for your example

SELECT name
FROM   nametable
ORDER  BY (split_part(name, '_', 1)::int
          ,split_part(name, '_', 2)