Flush cache to DRAM

Brian Magnuson picture Brian Magnuson · Sep 19, 2013 · Viewed 10.6k times · Source

I'm using a Xilinx Zynq platform with a region of memory shared between the programmable HW and the ARM processor.

I've reserved this memory using memmap on the kernel command line and then exposed it to userspace via mmap/io_remap_pfn_range calls in my driver.

The problem I'm having is that it takes some time for the writes to show up in DRAM and I presume it's stuck in dcache. There's a bunch of flush_cache_* calls defined but none of them are exported, which is a clue to me that I'm barking up the wrong tree...

As a trial I locally exported flush_cache_mm and just to see what would happen and no joy.

In short, how can I be sure that any writes to this mmap'd regions have been committed to DRAM?

Thanks.

Answer

artless noise picture artless noise · Sep 19, 2013

The ARM processors typically have both a I/D cache and a write buffer. The idea of a write buffer is to gang sequential writes together (great for synchronous DRAM) and to not delay the CPU to wait for a write to complete.

To be generic, you can flush the d cache and the write buffer. The following is some inline ARM assembler which should work for many architectures and memory configurations.

 static inline void dcache_clean(void)
 {
     const int zero = 0;
     /* clean entire D cache -> push to external memory. */
     __asm volatile ("1: mrc p15, 0, r15, c7, c10, 3\n"
                     " bne 1b\n" ::: "cc");
     /* drain the write buffer */
    __asm volatile ("mcr 15, 0, %0, c7, c10, 4"::"r" (zero));
 }

You may need more if you have an L2 cache.

To answer in a Linux context, there are different CPU variants and different routines depending on memory/MMU configurations and even CPU errata. See for instance,

These routines are either called directly or looked up in a cpu info structure with function pointers to the appropriate routine for the detected CPU and configuration; depending on whether the kernel is special purpose for a single CPU or multi-purpose like a Ubuntu distribution.

To answer the question specifically for your situation, we need to know L2 cache, write buffered memory, CPU architecture specifics; maybe including silicon revisions for errata. Another tactic is to avoid this completely by using the dma_alloc_XXX() routines which mark memory as un-cacheable and un-bufferable so that the CPU writes are pushed externally immediately. Depending on your memory access pattern, either solution is valid. You may wish to cache if the memory only needs to be synchronized at some checkpoint (vsync/*hsync* for video, etc).