How does jemalloc work? What are the benefits?

Albert picture Albert · Oct 26, 2009 · Viewed 39.7k times · Source

Firefox 3 came with a new allocator: jemalloc.

I have heard at several places that this new allocator is better. The top Google results don't gave any further information though and I am interested in how exactly it works.

Answer

paxdiablo picture paxdiablo · Oct 26, 2009

jemalloc first appeared for FreeBSD, the brainchild of one "Jason Evans", hence the "je". I would ridicule him for being egotistical had I not once written an operating system called paxos :-)

See this PDF for full details. It's a white paper describing in detail how the algorithms work.

The main benefit is scalability in multi-processor and multi-threaded systems achieved, in part, by using multiple arenas (the chunks of raw memory from which allocations are made).

In single-threaded situations, there is no real benefit to multiple arenas so a single arena is used.

However, in multi-threaded situations, many arenas are created (four times as many arenas as there are processors), and threads are assigned to these arenas in a round-robin fashion.

This means that lock contention can be reduced since, while multiple threads may call malloc or free concurrently, they'll only contend if they share the same arena. Two threads with different arenas will not affect each other.

In addition, jemalloc tries to optimise for cache locality since the act of fetching data from RAM is much slower than using data already in the CPU caches (no different in concept to the difference between fast fetching from RAM versus slow fetching from disk). To that end, it first tries to minimise memory usage overall since that is more likely to ensure the application's entire working set is in cache.

And, where that can't be achieved, it tries to ensure that allocations are contiguous, since memory allocated together tends to be used together.

From the white paper, these strategies seem to give similar performance to current best algorithms for single threaded use while offering improvements for multi-threaded usage.