We need an app to as much as possible, guarantee that when it reports a record persisted, it really was. I understand that to do this you use fsync(fd)
. However, for some strange reason, it appears using fsync() speeds up the code that writes to disk, instead of slowing it down as one would expect.
Some sample test code returns the following results:
no sync() seconds:0.013388 writes per second:0.000001
sync() seconds:0.006268 writes per second:0.000002
Below is the code that produces these results:
#include <stdio.h>
#include <fcntl.h>
#include <time.h>
#include <unistd.h>
void withSync() {
int f = open( "/tmp/t8" , O_RDWR | O_CREAT );
lseek (f, 0, SEEK_SET );
int records = 10*1000;
clock_t ustart = clock();
for(int i = 0; i < records; i++) {
write(f, "012345678901234567890123456789" , 30);
fsync(f);
}
clock_t uend = clock();
close (f);
printf(" sync() seconds:%lf writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}
void withoutSync() {
int f = open( "/tmp/t10" , O_RDWR | O_CREAT );
lseek (f, 0, SEEK_SET );
int records = 10*1000;
clock_t ustart = clock();
for(int i = 0; i < records; i++) {
write(f, "012345678901234567890123456789" , 30 );
}
clock_t uend = clock();
close (f);
printf("no sync() seconds:%lf writes per second:%lf \n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}
int main(int argc, const char * argv[])
{
withoutSync();
withSync();
return 0;
}
The issue is in the way you're attempting to time an I/O write. You semantically want to measure the wall-clock time between I/O record writes, but you are using the C library function clock
, which measures CPU execution time and not total time elapsed. Use clock_gettime
with a clock selection of CLOCK_MONOTONIC
or, ideally, CLOCK_MONOTONIC_RAW
(the latter being a Linux extension).
You are not collecting the total time elapsed between calls to clock
: you are collecting an estimate of the amount of time your process was spinning CPU cycles. Your disk I/O (specifically, both of the calls to write
and fsync
) is blocking, which means each of those system calls is handled by the kernel on your behalf and does not consume CPU within your process context. Hence, you need to measure the actual difference in wall-clock time, which as it sounds, is the total time elapsed in the real world, outside the scope of just your test program's process. Indeed, it is not CPU time you are concerned about at all with fsync
. Most of the I/O operations' execution time will not be handled by the kernel or even the CPU; it will be due to the disk controller.
Additionally, small record sizes are OK as a benchmark. It is a common use case for synchronized I/O (e.g., writing metadata for a transaction log). To get the timing stability of larger record sizes, simply increase the number of loop iterations significantly per timer interval and average/amortize. This will accurately model the cost of small blocking records being written and flushed synchronously.
Do consider fdatasync
for improved performance.