After doing the following test:
for( i = 0; i < 3000000; i++ ) {
printf( "Test string\n" );
}
for( i = 0; i < 3000000; i++ ) {
write( STDOUT_FILENO, "Test string\n", strlen( "Test string\n" ) );
}
it turns out that the calls to printf take a grand total of 3 seconds, while the calls to write take a whopping 46 seconds. How, with all the fancy formatting magic that printf
does, and the fact that printf
itself calls write
, is this possible? Is there something that I'm missing?
Any and all thoughts and input are appreciated.
How, with ... the fact that printf itself calls write, is this possible? Is there something that I'm missing?
Yes, there is something that you are missing. printf
doesn't necessarily call write
every time. Rather, printf
buffers its output. That is, it often stores its result in a memory buffer, only calling write
when the buffer is full, or on some other conditions.
write
is a fairly expensive call, much more expensive than copying data into printf
's buffer, so reducing the number of write
calls provides a net performance win.
If your stdout is directed to a terminal device, then printf
calls write
every time it sees a \n
-- in your case, every time it is called. If your stdout is directed to a file (or to /dev/null
), then printf
calls write only when its internal buffer is full.
Supposing that you are redirecting your output, and that printf
's internal buffer is 4Kbytes, then the first loop invokes write
3000000 / (4096 / 12) == 8780 times. Your second loop, however, invokes write
3000000 times.
Beyond the effect of fewer calls to write
, is the size of the calls to write
. The quantum of storage in a hard drive is a sector -- often 512 bytes. To write a smaller amount of data than a sector may involve reading the original data in the sector, modifying it, and writing the result back out. Invoking write
with a complete sector, however, may go faster since you don't have to read in the original data. printf
's buffer size is chosen to be a multiple of the typical sector size. That way the system can most efficiently write the data to disk.
I'd expect your first loop to go much faster than the second.