From here:
Instructions and data have different access patterns, and access different regions of memory. Thus, having the same cache for both instructions and data may not always work out.
Thus, it's rather common to have two caches: an instruction cache that only stores instructions, and a data cache that only stores data.
It's intuitive to know the distinction between instructions and data, but now I'm not show sure of the difference in this context? What constitutes as data and gets put into a data cache and what constitutes as instructions and gets put into an instruction cache?
I know ARM assembly. Would anything requiring STR
, LDR
, LDMF
or STMFD
use the data cache? But technically speaking STR
, LDR
, LDMF
and STMFD
are all instructions so I this is why I'm confused. Must "data" always exist with an "instruction"? Is data considered anything in the .data
section?
For example LDR R1, =myVar
then would LDR go into the instruction cache and the contents of myVar go into the data cache? Or does it not work like that?
Instructions and data have different access patterns Could someone please elaborate?
This comment I made on a helpful post highlights my difficulty understanding:
"The idea is that if an instruction has been loaded from memory, it's likely to be used again soon" but the only way to know the next instruction is to read it. That means a memory read (you can't say it's already in cache because a new instruction is being red). So I still don't see the point? Say a LDR instruction just happened, so now LDR is in the data cache. Maybe another LDR instruction will happen, maybe it won't, we can't be sure so we have to actually read the next instruction - thus defeating the purpose of cache.
Instruction fetches can be done in chunks with the assumption that much of the time you are going to run through many instructions in a row. so instruction fetches can be more efficient, there is likely a handful or more clocks of overhead per transaction then the delay for the memory to have the data ready then a clock per width of the bus for the size of the transaction. 8 words or instructions might be say 5+n+8 clocks for example, that is more efficient than one instruction at a time (5+1+1)*8.
Data on the other hand it is not that good of an assumption that data will be read sequentially much of the time so additional cycles can hurt, only fetch the data asked for (u p to the width of the memory or bus as that is a freebie).
On the ARMs I know about the L1 cache I and D are separate, L2 they are combined. L1 is not on the axi/amba bus and is likely more efficient of an access than the L2 and beyond which are amba/axi (a few cycles of overhead plus time plus one clock per bus width of data for every transaction).
For address spaces that are marked as cacheable (if the mmu is on) the L1 and as a result L2 will fetch a cache line instead of the individual item for data and perhaps more than a fetch amount of I data for an instruction fetch.
Each of your ldr and ldm instruction are going to result in data cycles that can if the address is cacheable go into the L2 and L1 caches if not already there. the instruction itself also if at a cacheable address will go into the L2 and L1 caches if not already there. (yes there are lots of knobs to control what is cacheable and not, dont want to get into those nuances, just assume for sake of the discussion all of these instruction fetches and data accesses are cacheable).
You would want to save instructions just executed in the cache in case you have a loop or run that code again. Also the instructions that follow in the cache line will benefit from the saved overhead of the more efficient access. but if you only execute through a small percentage of the cache line then overall those cycles are a waste, and if that happens too much then the cache made things slower.
Once something is in a cache then the next time it is read (or written depending on the settings) the cache copy is the one that is used, not the copy in slow memory. Eventually (depending on settings) if the cache copy of some item has been modified due to a write (str, stm) and some new access needs to be saved in the cache then an old one is evicted back to slow memory and a write from the cache to slow memory happens. You dont have this problem with instructions, instructions are basically read-only so you dont have to write them back to slow memory, in theory the cache copy and the slow memory copy are the same.
ldr r1,=myvar
will result in a pc relative load
ldr r1,something
...
something: .word myvar
the ldr instruction will be part of a cache line fetch, an instruction fetch (along with a bunch more instructions). these will be saved in I part of the L1 cache on an arm and the shared part of L2 (if enabled, etc). When that instruction is finally executed then the address for something will experience a data read, which if caching is enabled in that area for that read then it will also go into the L2 and L1 cache (D part) if not already there. If you loop around and run that instruction again right away then ideally the instruction will be in the L1 cache and the access time to fetch it is very fast a handful of clocks total. The data also will be in the L1 cache and will also be a handful of clocks to read.
The 5+n+8 I mentioned above, some number of clocks of overhead (5 is just a possibility, it can vary both by the design and by what else is going on in parallel). the N depends on the slower memory speeds. that n is quite large for dram, so the caches l2 and L1 are much much faster, and that is why the cache is there at all to reduce the large number of clock cycles for every dram access, efficient or not.