I have a Unicode string with some non-breaking spaces at the beginning and end. I get different results when using strip()
vs. strip(string.whitespace)
.
>>> import string
>>> s5 = u'\xa0\xa0hello\xa0\xa0'
>>> print s5.strip()
hello
>>> print s5.strip(string.whitespace)
hello
The documentation for strip()
says, "If omitted or None
, the chars
argument defaults to removing whitespace." The documentation for string.whitespace
says, "A string containing all characters that are considered whitespace."
So if string.whitespace
contains all characters that are considered whitespace, then why are the results different? Does it have something to do with Unicode?
I am using Python 2.7.6
From the documentation of the string.whitespace
:
A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.
It's the same under python3, where all non-ASCII constants where removed. (In python2 some constants could be influenced by locale
settings).
Hence the difference in behaviour is quite obvious since strip()
does remove any unicode whitespace, while strip(string.whitespace)
removes only ASCII spaces. Your string clearly contains non-ASCII spaces.