Python how to read N number of lines at a time

brokentypewriter picture brokentypewriter · Jun 13, 2011 · Viewed 45.6k times · Source

I am writing a code to take an enormous textfile (several GB) N lines at a time, process that batch, and move onto the next N lines until I have completed the entire file. (I don't care if the last batch isn't the perfect size).

I have been reading about using itertools islice for this operation. I think I am halfway there:

from itertools import islice
N = 16
infile = open("my_very_large_text_file", "r")
lines_gen = islice(infile, N)

for lines in lines_gen:
     ...process my lines...

The trouble is that I would like to process the next batch of 16 lines, but I am missing something

Answer

Sven Marnach picture Sven Marnach · Jun 13, 2011

islice() can be used to get the next n items of an iterator. Thus, list(islice(f, n)) will return a list of the next n lines of the file f. Using this inside a loop will give you the file in chunks of n lines. At the end of the file, the list might be shorter, and finally the call will return an empty list.

from itertools import islice
with open(...) as f:
    while True:
        next_n_lines = list(islice(f, n))
        if not next_n_lines:
            break
        # process next_n_lines

An alternative is to use the grouper pattern:

with open(...) as f:
    for next_n_lines in izip_longest(*[f] * n):
        # process next_n_lines