What is the recommended way to align memory in C++11

Rajiv picture Rajiv · Dec 26, 2013 · Viewed 33.3k times · Source

I am working on a single producer single consumer ring buffer implementation.I have two requirements:

  1. Align a single heap allocated instance of a ring buffer to a cache line.
  2. Align a field within a ring buffer to a cache line (to prevent false sharing).

My class looks something like:

#define CACHE_LINE_SIZE 64  // To be used later.

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
public:
  ....

private:
  std::atomic<int64_t> publisher_sequence_ ;
  int64_t cached_consumer_sequence_;
  T* events_;
  std::atomic<int64_t> consumer_sequence_;  // This needs to be aligned to a cache line.

};

Let me first tackle point 1 i.e. aligning a single heap allocated instance of the class. There are a few ways:

  1. Use the c++ 11 alignas(..) specifier:

    template<typename T, uint64_t num_events>
    class alignas(CACHE_LINE_SIZE) RingBuffer {
    public:
      ....
    
    private:
      // All the private fields.
    
    };
    
  2. Use posix_memalign(..) + placement new(..) without altering the class definition. This suffers from not being platform independent:

    void* buffer;
    if (posix_memalign(&buffer, 64, sizeof(processor::RingBuffer<int, kRingBufferSize>)) != 0) {
        perror("posix_memalign did not work!");
        abort();
    }
    // Use placement new on a cache aligned buffer.
    auto ring_buffer = new(buffer) processor::RingBuffer<int, kRingBufferSize>();
    
  3. Use the GCC/Clang extension __attribute__ ((aligned(#)))

    template<typename T, uint64_t num_events>
    class RingBuffer {
    public:
      ....
    
    private:
      // All the private fields.
    
    } __attribute__ ((aligned(CACHE_LINE_SIZE)));
    
  4. I tried to use the C++ 11 standardized aligned_alloc(..) function instead of posix_memalign(..) but GCC 4.8.1 on Ubuntu 12.04 could not find the definition in stdlib.h

Are all of these guaranteed to do the same thing? My goal is cache-line alignment so any method that has some limits on alignment (say double word) will not do. Platform independence which would point to using the standardized alignas(..) is a secondary goal.

I am not clear on whether alignas(..) and __attribute__((aligned(#))) have some limit which could be below the cache line on the machine. I can't reproduce this any more but while printing addresses I think I did not always get 64 byte aligned addresses with alignas(..). On the contrary posix_memalign(..) seemed to always work. Again I cannot reproduce this any more so maybe I was making a mistake.

The second aim is to align a field within a class/struct to a cache line. I am doing this to prevent false sharing. I have tried the following ways:

  1. Use the C++ 11 alignas(..) specifier:

    template<typename T, uint64_t num_events>
    class RingBuffer {  // This needs to be aligned to a cache line.
      public:
      ...
      private:
        std::atomic<int64_t> publisher_sequence_ ;
        int64_t cached_consumer_sequence_;
        T* events_;
        std::atomic<int64_t> consumer_sequence_ alignas(CACHE_LINE_SIZE);
    };
    
  2. Use the GCC/Clang extension __attribute__ ((aligned(#)))

    template<typename T, uint64_t num_events>
    class RingBuffer {  // This needs to be aligned to a cache line.
      public:
      ...
      private:
        std::atomic<int64_t> publisher_sequence_ ;
        int64_t cached_consumer_sequence_;
        T* events_;
        std::atomic<int64_t> consumer_sequence_ __attribute__ ((aligned (CACHE_LINE_SIZE)));
    };
    

Both these methods seem to align consumer_sequence to an address 64 bytes after the beginning of the object so whether consumer_sequence is cache aligned depends on whether the object itself is cache aligned. Here my question is - are there any better ways to do the same?

EDIT:

The reason aligned_alloc did not work on my machine was that I was on eglibc 2.15 (Ubuntu 12.04). It worked on a later version of eglibc.

From the man page: The function aligned_alloc() was added to glibc in version 2.16.

This makes it pretty useless for me since I cannot require such a recent version of eglibc/glibc.

Answer

Glenn Teitelbaum picture Glenn Teitelbaum · Dec 26, 2013

Unfortunately the best I have found is allocating extra space and then using the "aligned" part. So the RingBuffer new can request an extra 64 bytes and then return the first 64 byte aligned part of that. It wastes space but will give the alignment you need. You will likely need to set the memory before what is returned to the actual alloc address to unallocate it.

[Memory returned][ptr to start of memory][aligned memory][extra memory]

(assuming no inheritence from RingBuffer) something like:

void * RingBuffer::operator new(size_t request)
{
     static const size_t ptr_alloc = sizeof(void *);
     static const size_t align_size = 64;
     static const size_t request_size = sizeof(RingBuffer)+align_size;
     static const size_t needed = ptr_alloc+request_size;

     void * alloc = ::operator new(needed);
     void *ptr = std::align(align_size, sizeof(RingBuffer),
                          alloc+ptr_alloc, request_size);

     ((void **)ptr)[-1] = alloc; // save for delete calls to use
     return ptr;  
}

void RingBuffer::operator delete(void * ptr)
{
    if (ptr) // 0 is valid, but a noop, so prevent passing negative memory
    {
           void * alloc = ((void **)ptr)[-1];
           ::operator delete (alloc);
    }
}

For the second requirement of having a data member of RingBuffer also 64 byte aligned, for that if you know that the start of this is aligned, you can pad to force the alignment for data members.