How do I build a numpy array from a generator?

saffsd picture saffsd · Dec 15, 2008 · Viewed 68.8k times · Source

How can I build a numpy array out of a generator object?

Let me illustrate the problem:

>>> import numpy
>>> def gimme():
...   for x in xrange(10):
...     yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In this instance, gimme() is the generator whose output I'd like to turn into an array. However, the array constructor does not iterate over the generator, it simply stores the generator itself. The behaviour I desire is that from numpy.array(list(gimme())), but I don't want to pay the memory overhead of having the intermediate list and the final array in memory at the same time. Is there a more space-efficient way?

Answer

dhill picture dhill · Feb 24, 2009

One google behind this stackoverflow result, I found that there is a numpy.fromiter(data, dtype, count). The default count=-1 takes all elements from the iterable. It requires a dtype to be set explicitly. In my case, this worked:

numpy.fromiter(something.generate(from_this_input), float)