Does anybody happen to know why when you iterate over a file this way:
f = open('test.txt', 'r')
for line in f:
print "f.tell(): ",f.tell()
f.tell(): 8192
f.tell(): 8192
f.tell(): 8192
f.tell(): 8192
I consistently get the wrong file index from tell(), however, if I use readline I get the appropriate index for tell():
f = open('test.txt', 'r')
while True:
line = f.readline()
if (line == ''):
break
print "f.tell(): ",f.tell()
f.tell(): 103
f.tell(): 107
f.tell(): 115
f.tell(): 124
I'm running python 2.7.1 BTW.
Using open files as an iterator uses a read-ahead buffer to increase efficiency. As a result, the file pointer advances in large steps across the file as you loop over the lines.
From the File Objects documentation:
In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the
next()
method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combiningnext()
with other file methods (likereadline()
) does not work right. However, usingseek()
to reposition the file to an absolute position will flush the read-ahead buffer.
If you need to rely on .tell()
, don't use the file object as an iterator. You can turn .readline()
into an iterator instead (at the price of some performance loss):
for line in iter(f.readline, ''):
print f.tell()
This uses the iter()
function sentinel
argument to turn any callable into an iterator.