Is there a generator version of `string.split()` in Python?

Manoj Govindan picture Manoj Govindan · Oct 5, 2010 · Viewed 27.6k times · Source

string.split() returns a list instance. Is there a version that returns a generator instead? Are there any reasons against having a generator version?

Answer

ninjagecko picture ninjagecko · Mar 19, 2012

It is highly probable that re.finditer uses fairly minimal memory overhead.

def split_iter(string):
    return (x.group(0) for x in re.finditer(r"[A-Za-z']+", string))

Demo:

>>> list( split_iter("A programmer's RegEx test.") )
['A', "programmer's", 'RegEx', 'test']

edit: I have just confirmed that this takes constant memory in python 3.2.1, assuming my testing methodology was correct. I created a string of very large size (1GB or so), then iterated through the iterable with a for loop (NOT a list comprehension, which would have generated extra memory). This did not result in a noticeable growth of memory (that is, if there was a growth in memory, it was far far less than the 1GB string).