Python os.stat(file_name).st_size versus os.path.getsize(file_name)

Valdogg21 picture Valdogg21 · Sep 23, 2013 · Viewed 12.8k times · Source

I've got two pieces of code that are both meant to do the same thing -- sit in a loop until a file is done being written to. They are both mainly used for files coming in via FTP/SCP.

One version of the code does it using os.stat()[stat.ST_SIZE]:

size1,size2 = 1,0
while size1 != size2:
  size1 = os.stat(file_name)[stat.ST_SIZE]
  time.sleep(300)
  size2 = os.stat(file_name)[stat.ST_SIZE]

Another version does it with os.path.getsize():

size1,size2 = 0,0
while True:
  size2 = os.path.getsize(file_name)
  if size1 == size2:
    break
  else:
    time.sleep(300)
    size1 = size2

I've seen multiple instances where using the first method reports that the sizes are the same while the file is actually still growing. Is there some underlying reason why os.stat() would incorrectly report while os.path.getsize() would not? I'm not seeing any errors or exceptions come back.

Answer

NPE picture NPE · Sep 23, 2013

In CPython 2.6 and 2.7, os.path.getsize() is implemented as follows:

def getsize(filename):
    """Return the size of a file, reported by os.stat()."""
    return os.stat(filename).st_size

From this, it seems pretty clear that there is no reason to expect the two approaches to behave differently (except perhaps due to the different structures of the loops in your code).