file.tell() inconsistency

nigp4w rudy picture nigp4w rudy · Jan 3, 2013 · Viewed 8.8k times · Source

Does anybody happen to know why when you iterate over a file this way:

Input:

f = open('test.txt', 'r')
for line in f:
    print "f.tell(): ",f.tell()

Output:

f.tell(): 8192
f.tell(): 8192
f.tell(): 8192
f.tell(): 8192

I consistently get the wrong file index from tell(), however, if I use readline I get the appropriate index for tell():

Input:

f = open('test.txt', 'r')
while True:
    line = f.readline()
    if (line == ''):
        break
    print "f.tell(): ",f.tell()

Output:

f.tell(): 103
f.tell(): 107
f.tell(): 115
f.tell(): 124

I'm running python 2.7.1 BTW.

Answer

Martijn Pieters picture Martijn Pieters · Jan 3, 2013

Using open files as an iterator uses a read-ahead buffer to increase efficiency. As a result, the file pointer advances in large steps across the file as you loop over the lines.

From the File Objects documentation:

In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

If you need to rely on .tell(), don't use the file object as an iterator. You can turn .readline() into an iterator instead (at the price of some performance loss):

for line in iter(f.readline, ''):
    print f.tell()

This uses the iter() function sentinel argument to turn any callable into an iterator.