C volatile variables and Cache Memory

Microkernel picture Microkernel · Oct 24, 2011 · Viewed 24.1k times · Source

Cache is controlled by cache hardware transparently to processor, so if we use volatile variables in C program, how is it guaranteed that my program reads data each time from the actual memory address specified but not cache.

My understanding is that,

  1. Volatile keyword tells compiler that the variable references shouldn't be optimized and should be read as programmed in the code.

  2. Cache is controlled by cache hardware transparently, hence when processor issues an address, it doesn't know whether the data is coming from cache or the memory.

So, if I have a requirement of having to read a memory address every time required, how can I make sure that its not referred from cache but from required address?

Some how, these two concepts are not fitting together well. Please clarify how its done.

(Imagining we have write-back policy in cache (if required for analyzing the problem))

Thank you, Microkernel :)

Answer

Andrew Cottrell picture Andrew Cottrell · Oct 24, 2011

Firmware developer here. This is a standard problem in embedded programming, and one that trips up many (even very experienced) developers.

My assumption is that you are attempting to access a hardware register, and that register value can change over time (be it interrupt status, timer, GPIO indications, etc.).

The volatile keyword is only part of the solution, and in many cases may not be necessary. This causes the variable to be re-read from memory each time it is used (as opposed to being optimized out by the compiler or stored in a processor register across multiple uses), but whether the "memory" being read is an actual hardware register versus a cached location is unknown to your code and unaffected by the volatile keyword. If your function only reads the register once then you can probably leave off volatile, but as a general rule I will suggest that most hardware registers should be defined as volatile.

The bigger issue is caching and cache coherency. The easiest approach here is to make sure your register is in uncached address space. That means every time you access the register you are guaranteed to read/write the actual hardware register and not cache memory. A more complex but potentially better performing approach is to use cached address space and have your code manually force cache updates for specific situations like this. For both approaches, how this is accomplished is architecture-dependent and beyond the scope of the question. It could involve MTRRs (for x86), MMU, page table modifications, etc.

Hope that helps. If I've missed something, let me know and I'll expand my answer.