Different behaviour of ctypes c_char_p?

Sagar Masuti picture Sagar Masuti · May 25, 2014 · Viewed 9k times · Source

I am confused with this behaviour of different versions of python and dont understand why ?

Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> c="hello"
>>> a=ctypes.c_char_p(c)
>>> print(a.value) 
hello

Python 3.3.5 (default, Mar 11 2014, 15:08:59) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> c="hello" 
>>> a=ctypes.c_char_p(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bytes or integer address expected instead of str instance

One works while the other gives me an error. Which one is correct ?

If both of them are correct, how can i achieve the same behaviour as 2.7 in 3.3.5 ? I want to pass the char pointer to C from python.

Answer

Eryk Sun picture Eryk Sun · Jun 5, 2014

c_char_p is a subclass of _SimpleCData, with _type_ == 'z'. The __init__ method calls the type's setfunc, which for simple type 'z' is z_set.

In Python 2, the z_set function (2.7.7) is written to handle both str and unicode strings. Prior to Python 3, str is an 8-bit string. CPython 2.x str internally uses a C null-terminated string (i.e. an array of bytes terminated by \0), for which z_set can call PyString_AS_STRING (i.e. get a pointer to the internal buffer of the str object). A unicode string needs to first be encoded to a byte string. z_set handles this encoding automatically and keeps a reference to the encoded string in the _objects attribute.

>>> c = u'spam'
>>> a = c_char_p(c)
>>> a._objects
'spam'
>>> type(a._objects)
<type 'str'>

On Windows, the default ctypes string encoding is 'mbcs', with error handling set to 'ignore'. On all other platforms the default encoding is 'ascii', with 'strict' error handling. To modify the default, call ctypes.set_conversion_mode. For example, set_conversion_mode('utf-8', 'strict').

In Python 3, the z_set function (3.4.1) does not automatically convert str (now Unicode) to bytes. The paradigm shifted in Python 3 to strictly divide character strings from binary data. The ctypes default conversions were removed, as was the function set_conversion_mode. You have to pass c_char_p a bytes object (e.g. b'spam' or 'spam'.encode('utf-8')). In CPython 3.x, z_set calls the C-API function PyBytes_AsString to get a pointer to the internal buffer of the bytes object.

Note that if the C function modifies the string, then you need to instead use create_string_buffer to create a c_char array. Look for a parameter to be typed as const to know that it's safe to use c_char_p.