Regex punctuation split [Python]

dantdj picture dantdj · Nov 10, 2013 · Viewed 29.6k times · Source

Can anyone help me a bit with regexs? I currently have this: re.split(" +", line.rstrip()), which separates by spaces.

How could I expand this to cover punctuation, too?

Answer

Mister_Tom picture Mister_Tom · Nov 10, 2013

The official Python documentation has a good example for this one. It will split on all non-alphanumeric characters (whitespace and punctuation). Literally \W is the character class for all Non-Word characters. Note: the underscore "_" is considered a "word" character and will not be part of the split here.

re.split('\W+', 'Words, words, words.')

See https://docs.python.org/3/library/re.html for more examples, search page for "re.split"