strip() and strip(string.whitespace) give different results despite documentation suggesting they should be the same

Question 1

strip() and strip(string.whitespace) give different results despite documentation suggesting they should be the same

python unicode whitespace strip

Becca codes · Mar 6, 2014 · Viewed 22k times · Source

Answer

Answer

From the documentation of the string.whitespace:

A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.

It's the same under python3, where all non-ASCII constants where removed. (In python2 some constants could be influenced by locale settings).

Hence the difference in behaviour is quite obvious since strip() does remove any unicode whitespace, while strip(string.whitespace) removes only ASCII spaces. Your string clearly contains non-ASCII spaces.

Question 2

I have a Unicode string with some non-breaking spaces at the beginning and end. I get different results when using strip() vs. strip(string.whitespace).

>>> import string
>>> s5 = u'\xa0\xa0hello\xa0\xa0'
>>> print s5.strip()
hello
>>> print s5.strip(string.whitespace)
  hello

The documentation for strip() says, "If omitted or None, the chars argument defaults to removing whitespace." The documentation for string.whitespace says, "A string containing all characters that are considered whitespace."

So if string.whitespace contains all characters that are considered whitespace, then why are the results different? Does it have something to do with Unicode?

I am using Python 2.7.6

strip() and strip(string.whitespace) give different results despite documentation suggesting they should be the same

Answer

Related questions