How does mmap work?

Shaobo Wang picture Shaobo Wang · May 4, 2011 · Viewed 16.1k times · Source

I am working on programs in Linux which needs mmap file from harddrive, but i have a question, what can make it fail. Like if all the memories are fragmented, which has only 200M each, but i want to mmap a file to a memory of 1000M, will it succeed??

And another question, are there any tools in linux for recollect memory like some tools in Windows, e.g. the built-in tool for xp.

Thanks.

Answer

Wyzard picture Wyzard · May 4, 2011

mmap() uses addresses outide your program's heap area, so heap fragmentation isn't a problem, except to the extent that it can make the heap take up more space, and reduce the available space for mappings.

If you have lots of mapped files, you could potentially run into problems with fragmentation on a 32-bit system where the address space is relatively constrained. On a 64-bit system, fragmentation is unlikely to be a problem because even if you have only small regions available between existing mappings, there's still lots and lots of available contiguous address space, adjacent to the existing mappings.

The more common problem on a 32-bit system is that the address space is just too small to map large files at all. Of the 4GB address space, typically 2GB is available to userspace, with the other 2GB being reserved by the kernel. Of that available 2GB, your mappings have to share space with the program's code and stacks (typically small) and heap (potentially large).

In short, mmap() can often fail on 32-bit systems if the file is too large, but you're unlikely to ever have a file large enough to cause that problem on a 64-bit system.

If you're creating a private copy-on-write mapping, it can also fail due to lack of swap space. The kernel has to ensure that the sum of available RAM and swap is large enough to hold the size of your mapping, in case you modify all the pages so that the kernel is forced to make private copies of them all. A shared mapping shouldn't have this problem, since changes can be flushed to the file on disk, and then the pages can be discarded if memory is scarce and reloaded from disk later.

Of course, a mapping can also fail if you don't have permission to access the file, or if it's not a type of file that can be mapped (such as a directory or a socket).

It's not clear what you mean about recollecting memory. Remember that the scarce resource that mmap() consumes isn't memory, it's address space. You can potentially map a 1GB file even if the machine actually only has 128MB of RAM, but on a 32-bit system you can't map a 4GB file even if the machine has 16GB of RAM.

The concept of virtual memory is essential to understanding what mmap() does, so read about that if you're not familiar with it already.