Cycles/cost for L1 Cache hit vs. Register on x86?

Question 1

Cycles/cost for L1 Cache hit vs. Register on x86?

performance x86 cpu-architecture cpu-cache micro-optimization

user541686 · Apr 23, 2012 · Viewed 20.7k times · Source

Answer

Answer

Here's a great article on the subject:

http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/1

To answer your question - yes, a cache hit has approximately the same cost as a register access. And of course a cache miss is quite costly ;)

PS:

The specifics will vary, but this link has some good ballpark figures:

Approximate cost to access various caches and main memory?

Core i7 Xeon 5500 Series Data Source Latency (approximate)
L1 CACHE hit, ~4 cycles
L2 CACHE hit, ~10 cycles
L3 CACHE hit, line unshared ~40 cycles
L3 CACHE hit, shared line in another core ~65 cycles
L3 CACHE hit, modified in another core ~75 cycles remote
L3 CACHE ~100-300 cycles
Local DRAM ~30 ns (~120 cycles)
Remote DRAM ~100 ns

PPS:

These figures represent much older, slower CPUs, but the ratios basically hold:

http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/2

Level                    Access Time  Typical Size  Technology    Managed By
-----                    -----------  ------------  ---------     -----------
Registers                1-3 ns       ?1 KB          Custom CMOS  Compiler
Level 1 Cache (on-chip)  2-8 ns       8 KB-128 KB    SRAM         Hardware
Level 2 Cache (off-chip) 5-12 ns      0.5 MB - 8 MB  SRAM         Hardware
Main Memory              10-60 ns     64 MB - 1 GB   DRAM         Operating System
Hard Disk                3M - 10M ns  20 - 100 GB    Magnetic     Operating System/User

Question 2

I remember assuming that an L1 cache hit is 1 cycle (i.e. identical to register access time) in my architecture class, but is that actually true on modern x86 processors?

How many cycles does an L1 cache hit take? How does it compare to register access?

Cycles/cost for L1 Cache hit vs. Register on x86?

Answer

Related questions