Why does using fsync() to flush writes to disk speed up access?

Jay picture Jay · Nov 12, 2012 · Viewed 9.6k times · Source

We need an app to as much as possible, guarantee that when it reports a record persisted, it really was. I understand that to do this you use fsync(fd). However, for some strange reason, it appears using fsync() speeds up the code that writes to disk, instead of slowing it down as one would expect.

Some sample test code returns the following results:

no sync() seconds:0.013388   writes per second:0.000001 
   sync() seconds:0.006268   writes per second:0.000002

Below is the code that produces these results:

#include <stdio.h>
#include <fcntl.h>
#include <time.h>
#include <unistd.h>

void withSync() {
    int f = open( "/tmp/t8" , O_RDWR | O_CREAT );
    lseek (f, 0, SEEK_SET );
    int records = 10*1000;
    clock_t ustart = clock();
    for(int i = 0; i < records; i++) {
        write(f, "012345678901234567890123456789" , 30);
        fsync(f);
    }
    clock_t uend = clock();
    close (f);
    printf("   sync() seconds:%lf   writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}

void withoutSync() {
    int f = open( "/tmp/t10" , O_RDWR | O_CREAT );
    lseek (f, 0, SEEK_SET );
    int records = 10*1000;
    clock_t ustart = clock();
    for(int i = 0; i < records; i++) {
        write(f, "012345678901234567890123456789" , 30 );
    }
    clock_t uend = clock();
    close (f);
    printf("no sync() seconds:%lf   writes per second:%lf \n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}

int main(int argc, const char * argv[])
{
    withoutSync();
    withSync();
    return 0;
}

Answer

Matthew Hall picture Matthew Hall · Nov 12, 2012

The issue is in the way you're attempting to time an I/O write. You semantically want to measure the wall-clock time between I/O record writes, but you are using the C library function clock, which measures CPU execution time and not total time elapsed. Use clock_gettime with a clock selection of CLOCK_MONOTONIC or, ideally, CLOCK_MONOTONIC_RAW (the latter being a Linux extension).

You are not collecting the total time elapsed between calls to clock: you are collecting an estimate of the amount of time your process was spinning CPU cycles. Your disk I/O (specifically, both of the calls to write and fsync) is blocking, which means each of those system calls is handled by the kernel on your behalf and does not consume CPU within your process context. Hence, you need to measure the actual difference in wall-clock time, which as it sounds, is the total time elapsed in the real world, outside the scope of just your test program's process. Indeed, it is not CPU time you are concerned about at all with fsync. Most of the I/O operations' execution time will not be handled by the kernel or even the CPU; it will be due to the disk controller.

Additionally, small record sizes are OK as a benchmark. It is a common use case for synchronized I/O (e.g., writing metadata for a transaction log). To get the timing stability of larger record sizes, simply increase the number of loop iterations significantly per timer interval and average/amortize. This will accurately model the cost of small blocking records being written and flushed synchronously.

Do consider fdatasync for improved performance.