Tips and tricks on improving Fortran code performance

milancurcic picture milancurcic · Oct 15, 2011 · Viewed 12.9k times · Source

As part of my Ph.D. research, I am working on development of numerical models of atmosphere and ocean circulation. These involve numerically solving systems of PDE's on the order of ~10^6 grid points, over ~10^4 time steps. Thus, a typical model simulation takes hours to a few days to complete when run in MPI on dozens of CPUs. Naturally, improving model efficiency as much as possible is important, while making sure the results are byte-to-byte identical.

While I feel quite comfortable with my Fortran programming, and am aware of quite some tricks to make code more efficient, I feel like there is still space to improve, and tricks that I am not aware of.

Currently, I make sure I use as few divisions as possible, and try not to use literal constants (I was taught to do this from very early on, e.g. use half=0.5 instead of 0.5 in actual computations), use as few transcendental functions as possible etc.

What other performance sensitive factors are there? At the moment, I am wondering about a few:

1) Does the order of mathematical operations matter? For example if I have:

a=1E-7 ; b=2E4 ; c=3E13
d=a*b*c

would d evaluate with different efficiency based on the order of multiplication? Nowadays, this must be compiler specific, but is there a straight answer? I notice d getting (slightly) different value based on the order (precision limit), but will this impact the efficiency or not?

2) Passing lots (e.g. dozens) of arrays as arguments to a subroutine versus accessing these arrays from a module within the subroutine?

3) Fortran 95 constructs (FORALL and WHERE) versus DO and IF? I know that these mattered back in the 90's when code vectorization was a big thing, but is there any difference now with modern compilers being able to vectorize explicit DO loops? (I am using PGI, Intel, and IBM compilers in my work)

4) Raising a number to an integer power versus multiplication? E.g.:

b=a**4

or

b=a*a*a*a

I have been taught to always use the latter where possible. Does this affect efficiency and/or precision? (probably compiler dependent as well)

Please discuss and/or add any tricks and tips that you know about improving Fortran code efficiency. What else is out there? If you know anything specific to what each of the compilers above do related to this question, please include that as well.

Added: Note that I do not have any bottlenecks or performance issues per se. I am asking if there are any general rules for optimizing the code in sense of operations.

Thanks!

Answer

Francois Jacq picture Francois Jacq · Oct 15, 2011

Sorry but all the tricks you mentioned are simply ... ridiculous. More exactly, they have no meaning in practice. For instance:

  • what could be the advantage of using half(=0.5) instead of 0.5?
  • idem for computing a**4 or a*a*a*a. (a*a)** 2 would be another possibility too. My personal taste is a**4 because a good compiler which choose automatically the best way.

For **, the only point which could matter is the difference between a ** 4 and a ** 4., the latter being much more CPU time consuming. But even this point has no sense without a measurement in an actual simulation.

In fact, your approach is wrong. Develop your code as well as possible. After that, measure objectively the cost of the different parts of your code. Optimizing without measuring before is simply non sense.

If a part exhibits a high percentage of the CPU, 50% for instance, don't forget that optimizing that part only cannot divide the cost of the overall code by a factor greater than two. Any way, start the optimization work by the most expensive part (the bottle neck).

Don't forget also that the main improvements are generally coming from better algorithms.