tokenize a string keeping delimiters in Python

fortran picture fortran · Nov 30, 2009 · Viewed 18.4k times · Source

Is there any equivalent to str.split in Python that also returns the delimiters?

I need to preserve the whitespace layout for my output after processing some of the tokens.

Example:

>>> s="\tthis is an  example"
>>> print s.split()
['this', 'is', 'an', 'example']

>>> print what_I_want(s)
['\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

Thanks!

Answer

Jonathan Feinberg picture Jonathan Feinberg · Nov 30, 2009

How about

import re
splitter = re.compile(r'(\s+|\S+)')
splitter.findall(s)