Can anyone help me a bit with regexs? I currently have this: re.split(" +", line.rstrip())
, which separates by spaces.
How could I expand this to cover punctuation, too?
The official Python documentation has a good example for this one. It will split on all non-alphanumeric characters (whitespace and punctuation). Literally \W is the character class for all Non-Word characters. Note: the underscore "_" is considered a "word" character and will not be part of the split here.
re.split('\W+', 'Words, words, words.')
See https://docs.python.org/3/library/re.html for more examples, search page for "re.split"