Fortran intrinsic timing routines, which is better? cpu_time or system_clock

zeroth picture zeroth · Jul 29, 2011 · Viewed 28.8k times · Source

When timing a FORTRAN program i usually just use the command call cpu_time(t).
Then i stumbled across call system_clock([count,count_rate,count_max]) which seems to do the same thing. However, in a more difficult manor. My knowledge of these come from: Old Intel documentation.
I wasn't able to find it on Intel's homepage. See my markup below.

  1. Which is the more accurate, or are they similar?
  2. Do one of them count cache misses (or other of the sorts) and the other not, or do any of them?
  3. Or is the only difference being the marked thing in my markup below?

Those are my questions, below i have supplied a code for you to see some timings and usages. They have showed me that they are very similar in output and thus seem to be similar in implementation.
I should note that i will probably always stick with cpu_time, and that i don't really need more precise timings.

In the below code i have tried to compare them. (i have also tried more elaborate things, but will not supply in order to keep brevity) So basically my result is that:

  • cpu_time
    1. Is easier to use, you don't need the initialization calls
    2. Direct time in a difference
    3. Should also be compiler specific, but there is no way to see the precision. (the norm is milliseconds)
    4. Is sum of thread time. I.e. not recommended for parallel runs.
  • system_clock
    1. Needs pre-initialization.
    2. After-process, in form of a divide. (small thing, but nonetheless a difference)
    3. Is compiler specific. On my PC the following was found:
      • Intel 12.0.4 uses a count rate of 10000, due to the INTEGER precision.
      • gcc-4.4.5 uses 1000, do not know how this differentiates
    4. Is prone to encounter wraparounds, i.e. if c1 > c2, due to count_max
    5. Is time from one standard time. Thus this will yield the actual time of one thread and not the sum.

Code:

PROGRAM timer
  IMPLICIT NONE
  REAL :: t1,t2,rate 
  INTEGER :: c1,c2,cr,cm,i,j,n,s
  INTEGER , PARAMETER :: x=20000,y=15000,runs=1000
  REAL :: array(x,y),a_diff,diff

  ! First initialize the system_clock
  CALL system_clock(count_rate=cr)
  CALL system_clock(count_max=cm)
  rate = REAL(cr)
  WRITE(*,*) "system_clock rate ",rate

  diff = 0.0
  a_diff = 0.0
  s = 0
  DO n = 1 , runs
     CALL CPU_TIME(t1)
     CALL SYSTEM_CLOCK(c1)
     FORALL(i = 1:x,j = 1:y)
        array(i,j) = REAL(i)*REAL(j) + 2
     END FORALL
     CALL CPU_TIME(t2)
     CALL SYSTEM_CLOCK(c2)
     array(1,1) = array(1,2)     
     IF ( (c2 - c1)/rate < (t2-t1) ) s = s + 1
     diff = (c2 - c1)/rate - (t2-t1) + diff
     a_diff = ABS((c2 - c1)/rate - (t2-t1)) + a_diff
  END DO

  WRITE(*,*) "system_clock : ",(c2 - c1)/rate
  WRITE(*,*) "cpu_time     : ",(t2-t1)
  WRITE(*,*) "sc < ct      : ",s,"of",runs
  WRITE(*,*) "mean diff    : ",diff/runs
  WRITE(*,*) "abs mean diff: ",a_diff/runs
END PROGRAM timer

To complete i here give the output from my Intel 12.0.4 and gcc-4.4.5 compiler.

  • Intel 12.0.4 with -O0

    system_clock rate    10000.00    
    system_clock :    2.389600    
    cpu_time     :    2.384033    
    sc < ct      :            1 of        1000
    mean diff    :   4.2409324E-03
    abs mean diff:   4.2409897E-03
    
    real    42m5.340s
    user    41m48.869s
    sys 0m12.233s
    
  • gcc-4.4.5 with -O0

    system_clock rate    1000.0000    
    system_clock :    1.1849999    
    cpu_time     :    1.1840820    
    sc < ct      :          275 of        1000  
    mean diff    :   2.05709646E-03  
    abs mean diff:   2.71424348E-03  
    
    real    19m45.351s  
    user    19m42.954s  
    sys 0m0.348s  
    

Thanks for reading...

Answer

M. S. B. picture M. S. B. · Jul 30, 2011

These two intrinsics report different types of time. system_clock reports "wall time" or elapsed time. cpu_time reports time used by the CPU. On a multi-tasking machine these could be very different, e.g., if your process shared the CPU equally with three other processes and therefore received 25% of the CPU and used 10 cpu seconds, it would take about 40 seconds of actual elapsed or wall clock time.