Seeking and reading large files in a Linux C++ application

John Bellone picture John Bellone · Jun 24, 2009 · Viewed 31.2k times · Source

I am running into integer overflow using the standard ftell and fseek options inside of G++, but I guess I was mistaken because it seems that ftell64 and fseek64 are not available. I have been searching and many websites seem to reference using lseek with the off64_t datatype, but I have not found any examples referencing something equal to fseek. Right now the files that I am reading in are 16GB+ CSV files with the expectation of at least double that.

Without any external libraries what is the most straightforward method for achieving a similar structure as with the fseek/ftell pair? My application right now works using the standard GCC/G++ libraries for 4.x.

Answer

nos picture nos · Jun 24, 2009

fseek64 is a C function. To make it available you'll have to define _FILE_OFFSET_BITS=64 before including the system headers That will more or less define fseek to be actually fseek64. Or do it in the compiler arguments e.g. gcc -D_FILE_OFFSET_BITS=64 ....

http://www.suse.de/~aj/linux_lfs.html has a great overviw of large file support on linux:

  • Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
  • Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
  • Use the O_LARGEFILE flag with open to operate on large files.