I have a very big file 4GB and when I try to read it my computer hangs. So I want to read it piece by piece and after processing each piece store the processed piece into another file and read next piece.
Is there any method to yield
these pieces ?
I would love to have a lazy method.
To write a lazy function, just use yield
:
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open('really_big_file.dat') as f:
for piece in read_in_chunks(f):
process_data(piece)
Another option would be to use iter
and a helper function:
f = open('really_big_file.dat')
def read1k():
return f.read(1024)
for piece in iter(read1k, ''):
process_data(piece)
If the file is line-based, the file object is already a lazy generator of lines:
for line in open('really_big_file.dat'):
process_data(line)