I need to read (scan) a file sequentially and process its content. File size can be anything from very small (some KB) to very large (some GB).
I tried two techniques using VC10/VS2010 on Windows 7 64-bit:
I thought that memory mapped file technique could be faster than CRT functions, but some tests showed that the speed is almost the same in both cases.
The following C++ statements are used for MMF:
HANDLE hFile = CreateFile(
filename,
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_FLAG_SEQUENTIAL_SCAN,
NULL
);
HANDLE hFileMapping = CreateFileMapping(
hFile,
NULL,
PAGE_READONLY,
0,
0,
NULL
);
The file is read sequentially, chunk by chunk; each chunk is SYSTEM_INFO.dwAllocationGranularity
in size.
Considering that speed is almost the same with MMF and CRT, I'd use CRT functions because they are simpler and multi-platform. But I'm curious: am I using the MMF technique correctly? Is it normal that MMF performance in this case of scannig file sequentially is the same as CRT one?
Thanks.
I believe you'll not see much difference if you access the file sequentially. Because file I/O is very heavily cached, + read-ahead is probably also used.
The thing would be different if you had many "jumps" during the file data processing. Then, each time setting a new file pointer and reading a new file portion will probably kill CRT, whereas MMF will give you the maximum possible performance