How to correctly sort a string with a number inside?

Michal picture Michal · May 11, 2011 · Viewed 94.2k times · Source

I have a list of strings containing numbers and I cannot find a good way to sort them.
For example I get something like this:

something1
something12
something17
something2
something25
something29

with the sort() method.

I know that I probably need to extract the numbers somehow and then sort the list but I have no idea how to do it in the most simple way.

Answer

unutbu picture unutbu · May 11, 2011

Perhaps you are looking for human sorting (also known as natural sorting):

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    '''
    return [ atoi(c) for c in re.split(r'(\d+)', text) ]

alist=[
    "something1",
    "something12",
    "something17",
    "something2",
    "something25",
    "something29"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something2', 'something12', 'something17', 'something25', 'something29']

PS. I've changed my answer to use Toothy's implementation of natural sorting (posted in the comments here) since it is significantly faster than my original answer.


If you wish to sort text with floats, then you'll need to change the regex from one that matches ints (i.e. (\d+)) to a regex that matches floats:

import re

def atof(text):
    try:
        retval = float(text)
    except ValueError:
        retval = text
    return retval

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    float regex comes from https://stackoverflow.com/a/12643073/190597
    '''
    return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', text) ]

alist=[
    "something1",
    "something2",
    "something1.0",
    "something1.25",
    "something1.105"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something1.0', 'something1.105', 'something1.25', 'something2']