How does numpy determine the array data type when it contains multiple dtypes?

Pranav Joshi picture Pranav Joshi · Apr 10, 2018 · Viewed 9.3k times · Source

I am trying to do hands on the numpy, i cam across following datatype when used inbuilt method dtype.Following the few results i have got. Can you please explain what it means by u11

a1 = np.array([3,5,'p'])
print(a1.dtype)

o/p = >U11

Answer

Mazdak picture Mazdak · Apr 10, 2018

Numpy's array objects that are PyArrayObject types, have a NPY_PRIORITY attribute that denotes the priority of the types of items for cases where the array contains items with heterogeneous data types. You can access this priority using PyArray_GetPriority API that returns the __array_priority__ attribute which according to the the documents:

class.__array_priority__ : The value of this attribute is used to determine what type of object to return in situations where there is more than one possibility for the Python type of the returned object. Subclasses inherit a default value of 0.0 for this attribute.

Now, in this case Unicode has a higher priority than integer type and that's why a1.dtype returns U11.

Regarding the U11 or in general U#, you need to note that it consists of two parts; the U which denotes a Unicode dtype and the # shows the number of elements that it can hold --but it can be different in different platforms.

In [45]: a1.dtype
Out[45]: dtype('<U21')  # 64bit Linux

In [46]: a1.dtype.type  # The type object used to instantiate a scalar of this data-type. 
Out[46]: numpy.str_

In [49]: a1.dtype.itemsize
Out[49]: 84 # 21 * 4

Read more info in greater details about string types and other datatype objects in documentation https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.dtypes.html#data-type-objects-dtype.