How can I build a numpy array out of a generator object?
Let me illustrate the problem:
>>> import numpy
>>> def gimme():
... for x in xrange(10):
... yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In this instance, gimme()
is the generator whose output I'd like to turn into an array. However, the array constructor does not iterate over the generator, it simply stores the generator itself. The behaviour I desire is that from numpy.array(list(gimme()))
, but I don't want to pay the memory overhead of having the intermediate list and the final array in memory at the same time. Is there a more space-efficient way?
One google behind this stackoverflow result, I found that there is a numpy.fromiter(data, dtype, count)
. The default count=-1
takes all elements from the iterable. It requires a dtype
to be set explicitly. In my case, this worked:
numpy.fromiter(something.generate(from_this_input), float)