After writing to a file, why does os.path.getsize still return the previous size?

Maulin picture Maulin · Jun 18, 2009 · Viewed 25.8k times · Source

I am trying to split up a large xml file into smaller chunks. I write to the output file and then check its size to see if its passed a threshold, but I dont think the getsize() method is working as expected.

What would be a good way to get the filesize of a file that is changing in size.

Ive done something like this...

import string
import os

f1 = open('VSERVICE.xml', 'r')
f2 = open('split.xml', 'w')

for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size = os.path.getsize('split.xml')
    print('size = ' + str(size))

running this prints 0 as the filesize for about 80 iterations and then 4176. Does Python store the output in a buffer before actually outputting it?

Answer

Sri picture Sri · Apr 28, 2011

File size is different from file position. For example,

os.path.getsize('sample.txt') 

It exactly returns file size in bytes.

But

f = open('sample.txt')
print f.readline()
f.tell() 

Here f.tell() returns the current position of the file handler - i.e. where the next write will put its data. Since it is aware of the buffering, it should be accurate as long as you are simply appending to the output file.