When should I ever use file.read() or file.readlines()?

Maverick Meerkat picture Maverick Meerkat · Jun 29, 2016 · Viewed 66.2k times · Source

I noticed that if I iterate over a file that I opened, it is much faster to iterate over it without "read"-ing it.

i.e.

l = open('file','r')
for line in l:
    pass (or code)

is much faster than

l = open('file','r')
for line in l.read() / l.readlines():
    pass (or code)

The 2nd loop will take around 1.5x as much time (I used timeit over the exact same file, and the results were 0.442 vs. 0.660), and would give the same result.

So - when should I ever use the .read() or .readlines()?

Since I always need to iterate over the file I'm reading, and after learning the hard way how painfully slow the .read() can be on large data - I can't seem to imagine ever using it again.

Answer

Checkmate picture Checkmate · Jun 29, 2016

The short answer to your question is that each of these three methods of reading bits of a file have different use cases. As noted above, f.read() reads the file as an individual string, and so allows relatively easy file-wide manipulations, such as a file-wide regex search or substitution.

f.readline() reads a single line of the file, allowing the user to parse a single line without necessarily reading the entire file. Using f.readline() also allows easier application of logic in reading the file than a complete line by line iteration, such as when a file changes format partway through.

Using the syntax for line in f: allows the user to iterate over the file line by line as noted in the question.

(As noted in the other answer, this documentation is a very good read):

https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

Note: It was previously claimed that f.readline() could be used to skip a line during a for loop iteration. However, this doesn't work in Python 2.7, and is perhaps a questionable practice, so this claim has been removed.