Replace non alphanumeric characters except some exceptions python

user1977867 picture user1977867 · Jan 14, 2015 · Viewed 14.8k times · Source

In perl s/[^\w:]//g would replace all non alphanumeric characters EXCEPT :

In python I'm using re.sub(r'\W+', '',mystring) which does remove all non alphanumeric except _ underscore. Is there any way to put exceptions, I wish not to replace signs like = and .

Previously I was applying the other approach i.e. to replace all unwanted characters usingre.sub('[!@#\'\"$()]', '',mystring`) However, it is not possible for me to predict what all characters may come in mystring hence I wish to remove all non alphanumeric characters except a few.

Google didnt provide an appropriate answer. The closest search being python regex split any \W+ with some exceptions but this didnt help me either.

Answer

nu11p01n73R picture nu11p01n73R · Jan 14, 2015

You can specify everything that you need not remove in the negated character clas.

re.sub(r'[^\w'+removelist+']', '',mystring)

Test

>>> import re
>>> removelist = "=."
>>> mystring = "asdf1234=.!@#$"
>>> re.sub(r'[^\w'+removelist+']', '',mystring)
'asdf1234=.'

Here the removelist variable is a string which contains the list of all characters you need to exclude from the removal.

What does negated character class means

When the ^ is moved into the character class it does not acts as an anchor where as it negates the character class.

That is ^ in inside a character class say like [^abc] it negates the meaning of the character class.

For example [abc] will match a b or c where as [^abc] will not match a b or c. Which can also be phrased as anything other than a b or c