Can I reset an iterator / generator in Python? I am using DictReader and would like to reset it to the beginning of the file.
I see many answers suggesting itertools.tee, but that's ignoring one crucial warning in the docs for it:
This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use
list()
instead oftee()
.
Basically, tee
is designed for those situation where two (or more) clones of one iterator, while "getting out of sync" with each other, don't do so by much -- rather, they say in the same "vicinity" (a few items behind or ahead of each other). Not suitable for the OP's problem of "redo from the start".
L = list(DictReader(...))
on the other hand is perfectly suitable, as long as the list of dicts can fit comfortably in memory. A new "iterator from the start" (very lightweight and low-overhead) can be made at any time with iter(L)
, and used in part or in whole without affecting new or existing ones; other access patterns are also easily available.
As several answers rightly remarked, in the specific case of csv
you can also .seek(0)
the underlying file object (a rather special case). I'm not sure that's documented and guaranteed, though it does currently work; it would probably be worth considering only for truly huge csv files, in which the list
I recommmend as the general approach would have too large a memory footprint.