FORTRAN ?: Trying to bypass significant slowdown

  • Context: Fortran 
  • Thread starter Thread starter blbelson
  • Start date Start date
  • Tags Tags
    Fortran
Click For Summary

Discussion Overview

The discussion revolves around performance issues in a FORTRAN program, specifically related to a significant slowdown caused by a particular calculation within nested loops. Participants explore potential optimizations, compiler settings, and alternative coding strategies to improve execution time.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant notes a drastic increase in execution time when a specific line of code is included, suggesting a potential compiler optimization issue.
  • Another participant proposes loop unrolling as a method to reduce the number of iterations and improve performance.
  • A different approach involves moving calculations from the inner loop to outer loops and using temporary variables to store intermediate results.
  • Concerns are raised about the overhead of calling a subroutine multiple times, which could negate performance gains.
  • Some participants suggest that reformulating the problem could reduce the number of integrations required.
  • There is a discussion about the potential benefits of using C over FORTRAN, with varying opinions on whether C is inherently faster.
  • One participant emphasizes the importance of the order of DO loops for performance optimization.
  • Another participant mentions that the system does not possess rotational symmetry, which affects the integration approach.

Areas of Agreement / Disagreement

Participants express multiple competing views on optimization strategies, the effectiveness of subroutine calls, and the comparison between FORTRAN and C. The discussion remains unresolved regarding the best approach to achieve the desired performance improvements.

Contextual Notes

Participants highlight limitations such as the potential for compiler optimizations to vary based on the specific compiler used and the nature of the calculations involved. There are also unresolved questions about the impact of loop ordering and the implications of using different programming languages.

Who May Find This Useful

This discussion may be useful for programmers working with FORTRAN who are experiencing performance issues in numerical computations, as well as those interested in optimization techniques and language comparisons in scientific computing.

blbelson
Messages
4
Reaction score
0
I have experienced a serious program execution slow down and traced its source to one calculation...pseudo code below:

do a=0, 360, 10
do b=0,85,5
do c=0,360,10
do d=0,85,5
bunch of calculations involving double precision real variables (stored in variable ele)
tot=tot + ele
end do
end do
end do
end do

tot and ele are both double precision real variables. If I comment out "tot=tot+ele", the program takes 2m 04s to run, otherwise it takes 5m 55s to run.

I am using the ifort compiler with "-ipo -O3 -static -xP -no-prec-div -inline" flags set. Does anyone have an explanation as to why this occurs (compiler optimization issue for example) and if there is a way to prevent it?
 
Technology news on Phys.org
Each statement in your innermost loop is running 17*36*17*36 times, or 374,544 times. If you can eliminate one or more of your loops, that might speed things up. For example, you can "unroll" the innermost loop by having a separate statement or block of statements for each of the values of your loop counter variable, d. The first one would be with d = 0, the second one with d = 5, and so on, up to 85.
 
Thanks Mark 44, I believe the compiler simply skipped the loop when I commented out the line since the loop served no purpose in that instance. I was afraid I'd have to tackle the problem in the way you suggested above but I took a hybrid approach and wrote a subroutine since there are quite a few calculations in that block (realizing the performance boost may be less than writing each line of code in the main body). Doing so reduced program execution time to 4m 22s. Thanks for the kickstart...
 
Calling a subroutine 374,544 times might be slower than having the calculations inline in the innermost loop. There is some overhead associated with a subroutine call and return. You might compare the execution times with a subroutine call vs. having the code inline in the inner loop.
 
Perhaps you can move some of the calculations from the inner loop to the outward ones, and save intermediate results in temp variables? Although it is most likely already done by the compiler.
 
This looks like integration over two spheres. Can you reformulate the problem to eliminate some integrations? For example, if relevant degrees of freedom only depend on the angle between two vectors, you can get away with only one loop instead of 4.

And you shouldn't integrate to 360, you should integrate to 360-step.

Also consider doing this in C.
 
Last edited:
blbelson said:
If I comment out "tot=tot+ele", the program takes 2m 04s to run, otherwise it takes 5m 55s to run.
Your program probably ran out of registers to use, and perhaps the compiler didin't prioritize inner variable over outer variables. It should help to declare those loop counters as integer, which use a different set of registers.
 
> Also consider doing this in C.

Why? C is not intrinsically any faster.
 
Right C is not faster. As a first thought, make sure that your DO loops are in the right "order". You want your inside loop to be the first index in your array, e.g.
Code:
DO k=1,kmax
 DO j=1,jmax
  DO i=1,imax
    array(i,j,k) = something
  END DO
 END DO
END DO
Is much faster than having the DO loops in the opposite order.

Also, as a test (really not sure if it will be faster or not), you can try using the SUM array intrinsic, e.g.
Code:
DO
DO
DO 
DO
 elem = whatever
END DO
END DO
END DO
END DO

tot = SUM(elem)
 
  • #10
Appreciate all the responses. The portion of code in question was just one step in a very lengthy problem. I am quite pleased with the improvements you all have helped me achieve - over 50% faster on the one subroutine which was called over 12,000 times. The changes have moved my dissertation one step closer to completion, thanks!

I'll respond to each suggestion below:

1. I had already tried doing a "sum" outside the nest with no noticeable improvement in calculation time.
2. There is no way around four integrations, the integrations are not spheres but incident and reflected angles.
3. I sucked it up further and put the code in-line for comparison. It took 4m 22s just as it did with a subroutine call. I believe the compiler optimizations I chose may include making the subroutine in-line anyway. I returned to using the subroutine.
4. Great catch on the 360-step. That was a mistake that I did not catch. With that change, I am down to 4m 13s...and what's better, the calculation is now right!
5. Not sure what benefit "C" would have in this project. That change is too significant to test.
 
  • #11
There is no way around four integrations, the integrations are not spheres but incident and reflected angles.

Does the system possesses rotational symmetry? If it does, one integration over 0 to 360 can be eliminated.

Not sure what benefit "C" would have in this project. That change is too significant to test

Being a somewhat lower-level language, C is considerably faster than Fortran (up to 2-3 times, depending on task) and that makes it more suited for heavy numerical programming.

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=gcc&lang2=ifc&box=1
 
Last edited by a moderator:
  • #12
hamster143 said:
Being a somewhat lower-level language, C is considerably faster than Fortran.
It really depends on the compilier. Comparing a bad Fortran compiler versus a good C compiler isn't fair. In the case of some Cray supercomputers (like the nearly extinct X1 X1E series vector processing machines), the Fortran compliler is faster than C, partly because some vendor specific extensions were made to the Fortan language used on the Cray, and partly because that particular Fortran compiler optimizes very well on the Cray supercomputer. Newer Cray systems are supposed to combine Intel or AMD cpu's with specialized vector math units, but I don't know how many of these have been made.

I don't know what options there are in terms of Fortran compilers for PC based systems, and if which of these, if any, do a really good job of optimizing code, or if in this case, the required floating point calcuations simply can't be optimized beyond some basic level.
 
Last edited:
  • #13
hamster143 said:
Does the system possesses rotational symmetry? If it does, one integration over 0 to 360 can be eliminated.

The system is not rotationally symmetric. The integrations are over all possible incident and reflected angles.
QUOTE]
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K