Comparing strings in postgres using comparison operators?

bakamike picture bakamike · Nov 15, 2016 · Viewed 31k times · Source

In many programming languages you can compare strings using operators like >, >=, < etc...and the language will base the comparison on the position of the letter in the alphabet.

For example in PHP

if ('a' < 'b') {
    echo 'Yes';
} else {
    echo 'No';
}
> Yes

However in postgres or mysql

SELECT
CASE WHEN 'a' < 'b' THEN 'yes' END
FROM table
Output: null

I have a table with strings that I need to compare against each other through SQL.

For example: 6.2(5a) 6.2(5b) -- this would be greater than 6.2(5a) Or 6.2(15) -- this would be greater than 6.2(5a)

I thought of assigning a number to a letter using a regexp but then that would break the comparisons when there are no letter.

How would you go about this purely in SQL?

Answer

Schwern picture Schwern · Nov 15, 2016

NOTE: The original answer went off on a red herring.

A simple comparison sorts character by character.

select 'a1' < 'a9'; -- true because 'a' = 'a' and '1' < '9'.

...but quickly goes to pot.

select 'a10' < 'a9'; -- also true for the same reason.

What you want is a natural sort where the string parts are compared as strings and the numbers are compared as numbers. Doing a natural sort in SQL is not the easiest thing. You either need fixed field widths to sort each substring separately, or maybe something with regexes...

Fortunately there's pg_natural_sort_order, a Postgres extension that implements an efficient natural sort.

If you can't install extensions you can use a stored procedure like btrsort by 2kan.

CREATE FUNCTION btrsort_nextunit(text) RETURNS text AS $$
    SELECT 
        CASE WHEN $1 ~ '^[^0-9]+' THEN
            COALESCE( SUBSTR( $1, LENGTH(SUBSTRING($1 FROM '[^0-9]+'))+1 ), '' )
        ELSE
            COALESCE( SUBSTR( $1, LENGTH(SUBSTRING($1 FROM '[0-9]+'))+1 ), '' )
        END

$$ LANGUAGE SQL;

CREATE FUNCTION btrsort(text) RETURNS text AS $$
    SELECT 
        CASE WHEN char_length($1)>0 THEN
            CASE WHEN $1 ~ '^[^0-9]+' THEN
                RPAD(SUBSTR(COALESCE(SUBSTRING($1 FROM '^[^0-9]+'), ''), 1, 12), 12, ' ') || btrsort(btrsort_nextunit($1))
            ELSE
                LPAD(SUBSTR(COALESCE(SUBSTRING($1 FROM '^[0-9]+'), ''), 1, 12), 12, ' ') || btrsort(btrsort_nextunit($1))
            END
        ELSE
            $1
        END
      ;
$$ LANGUAGE SQL;

Though it doesn't provide a comparison operator and I'm not going to pretend to understand it. This allows you to use it in an order by.

select * from things order by btrsort(whatever);

To prevent your naturally sorted queries from turning to mud on large tables, you can create a btree index on the result of that function.

create index things_whatever_btrsort_idx ON things( btrsort(whatever) );

SELECT
  CASE WHEN 'a' < 'b' THEN 'yes' END
  FROM table
  Output: null

This will only output nothing if the table is empty. You don't need a table to test select statements.

SELECT
CASE WHEN 'a' < 'b' THEN 'yes' END  -- yes