In order to find avg memory access time we have the formula :
Tavg = h*Tc +(1-h)*M
where h = hit rate
(1-h) = miss rate
Tc = time to access information from cache
M = miss penalty (time to access main memory)
I have been solving quite a few problems on this concept recently. At times I find that there is this disturbing inconsistency :
Case 1: M = Tm + Tc
Case 2: M = Tm
Meaning, the solutions indicate that value of 'M' is calculated for some question X as in 'Case 1' above, while in some other question Y, the same is calculated as in 'Case 2' above. I tried my best analyzing these questions to find what is that factor which makes the calculation different. No help. I've encountered situations where X & Y are exactly the same, just that the values are only different, yet calculation is done for X as in Case 1 and for Y as in Case 2.
Any other reason that I am not aware of which makes the calculation different? Thank you in advance.
The difference comes from when the latency of a miss is counted. If the problem states that the time is a miss penalty, it should mean that the time is in addition to the time for a cache hit; so the total miss latency is the latency of a cache hit plus the penalty. (Clearly your formula and variables do not take this approach, labeling M--which is really total access time on a miss--as the miss penalty.)
Sadly, if a problem says "memory access latency" or "L2 access latency", it is even less clear whether total access latency is meant (i.e., including the time for an L1 hit) or the extra time required by an L1 miss. The former has some conceptual advantages (e.g., it can hide details like L2 access beginning before data return would occur on a hit--say by using early miss detection or miss prediction or even parallel tag look-up for L1 and L2). The latter may make explaining latency effects of L2 size or associativity simpler (e.g., if doubling size increases L2-only access latency by 50%, it may be easier to understand L2-only latency increasing from 8 cycle to 12 cycles by doubling size and to 18 cycles by quadrupling size than to see total latency increase from 10 cycles [with Tc = 2] to 14 cycles and to 20 cycles.)
(Also, using a miss penalty number allows a slight simplification of the access time formula-- Tavg = Tc + (1-h)Tm --because Tc is always spent.)
A similar issue occurs for execution latency. With a scalar pipeline an instruction that takes one cycle to execute is often said to have zero latency because there is no delay in executing a subsequent dependent instruction. However, this use of latency can be confusing when considering a superscalar implementation.