python string interpolation

vaab picture vaab · Feb 22, 2010 · Viewed 14.9k times · Source

What could generate the following behavior ?

>>> print str(msg)
my message
>>> print unicode(msg)
my message

But:

>>> print '%s' % msg
another message

More info:

  • my msg object is inherited from unicode.
  • the methods __str__/__unicode__/__repr__ methods were overridden to return the string 'my message'.
  • the msg object was initialised with the string 'another message'.
  • this is running on python 2.5
  • the variable msg was not changed between the tests
  • this is actually real doctest that is really giving these results.

I would like an solution that matches this doctest, with minimal fuss (especially around the actual inheritance):

>>> print '%s' % msg
my message

Thanks for all suggestions.

I don't feel this will help more, but for curious readers (and adventurous pythonist), here's the implementation of the object:

class Message(zope.i18nmessageid.Message):

    def __repr__(self):
        return repr(zope.i18n.interpolate(self.default, self.mapping))

    def __str__(self):
        return zope.i18n.interpolate(self.default, self.mapping)

    def __unicode__(self):
        return zope.i18n.interpolate(self.default, self.mapping)

This is how we create the object msg:

>>> msg = Message('another message', 'mydomain', default='my message')

Zope packages version and code used are:

EDIT INFO:

  • added/updated the names of the methods that were overriden
  • added some more info (python version, and minor info)
  • updated some wrong info (the class of `msg` is based on `unicode` class and not `basestring`)
  • added the actual implementation of the class used

Answer

Michał Marczyk picture Michał Marczyk · Feb 22, 2010

Update 2: Please find the original answer, including a simple example of a class exhibiting the behaviour described by the OP, below the horizontal bar. As for what I was able to surmise in the course of my inquiry into Python's sources (v. 2.6.4):

The file Include/unicodeobject.h contains the following to lines (nos. 436-7 in my (somewhat old) checkout):

#define PyUnicode_AS_UNICODE(op) \                                              
        (((PyUnicodeObject *)(op))->str)

This is used all over the place in the formatting code, which, as far as I can tell, means that during string formatting, any object which inherits from unicode will be reached into so that its unicode string buffer may be used directly, without calling any Python methods. Which is good as far as performance is concerned, I'm sure (and very much in line with Juergen's conjecture in a comment on this answer).

For the OP's question, this probably means that making things work the way the OP would like them to may only be possible if something like Anurag Uniyal's wrapper class idea is acceptable for this particular use case. If it isn't, the only thing which comes to my mind now is to wrap objects of this class in str / unicode wherever their being interpolated into a string... ugh. (I sincerely hope I'm just missing a cleaner solution which someone will point out in a minute!)


(Update: This was posted about a minute before the OP included the code of his class, but I'm leaving it here anyway (1) for the conjecture / initial attempt at an explanation below the code, (2) for a simple example of how to produce this behaviour (Anurag Uniyal has since provided another one calling unicode's constructor directly, as opposed to via super), (3) in hope of later being able to edit in something to help the OP in obtaining the desired behaviour.)

Here's an example of a class which actually works like what the OP describes (Python 2.6.4, it does produce a deprecation warning -- /usr/bin/ipython:3: DeprecationWarning: object.__init__() takes no parameters):

class Foo(unicode):
    def __init__(self, msg):
        super(unicode, self).__init__(msg)
    def __str__(self): return 'str msg'
    def __repr__(self): return 'repr msg'
    def __unicode__(self): return u'unicode msg'

A couple of interactions in IPython:

In [12]: print(Foo("asdf"))
asdf

In [13]: str(Foo("asdf"))
Out[13]: 'str msg'

In [14]: print str(Foo("asdf"))
-------> print(str(Foo("asdf")))
str msg

In [15]: print(str(Foo("asdf")))
str msg

In [16]: print('%s' % Foo("asdf"))
asdf

Apparently string interpolation treats this object as an instance of unicode (directly calling the unicode implementation of __str__), whereas the other functions treat it as an instance of Foo. How this happens internally and why it works like this and whether it's a bug or a feature, I really don't know.

As for how to fix the OP's object... Well, how would I know without seeing its code??? Give me the code and I promise to think about it! Ok, I'm thinking about it... No ideas so far.