Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

FORTRAN ?: Trying to bypass significant slowdown

  1. Nov 5, 2009 #1
    I have experienced a serious program execution slow down and traced its source to one calculation...pseudo code below:

    do a=0, 360, 10
    do b=0,85,5
    do c=0,360,10
    do d=0,85,5
    bunch of calculations involving double precision real variables (stored in variable ele)
    tot=tot + ele
    end do
    end do
    end do
    end do

    tot and ele are both double precision real variables. If I comment out "tot=tot+ele", the program takes 2m 04s to run, otherwise it takes 5m 55s to run.

    I am using the ifort compiler with "-ipo -O3 -static -xP -no-prec-div -inline" flags set. Does anyone have an explanation as to why this occurs (compiler optimization issue for example) and if there is a way to prevent it?
     
  2. jcsd
  3. Nov 5, 2009 #2

    Mark44

    Staff: Mentor

    Each statement in your innermost loop is running 17*36*17*36 times, or 374,544 times. If you can eliminate one or more of your loops, that might speed things up. For example, you can "unroll" the innermost loop by having a separate statement or block of statements for each of the values of your loop counter variable, d. The first one would be with d = 0, the second one with d = 5, and so on, up to 85.
     
  4. Nov 5, 2009 #3
    Thanks Mark 44, I believe the compiler simply skipped the loop when I commented out the line since the loop served no purpose in that instance. I was afraid I'd have to tackle the problem in the way you suggested above but I took a hybrid approach and wrote a subroutine since there are quite a few calculations in that block (realizing the performance boost may be less than writing each line of code in the main body). Doing so reduced program execution time to 4m 22s. Thanks for the kickstart...
     
  5. Nov 5, 2009 #4

    Mark44

    Staff: Mentor

    Calling a subroutine 374,544 times might be slower than having the calculations inline in the innermost loop. There is some overhead associated with a subroutine call and return. You might compare the execution times with a subroutine call vs. having the code inline in the inner loop.
     
  6. Nov 6, 2009 #5

    Borek

    User Avatar

    Staff: Mentor

    Perhaps you can move some of the calculations from the inner loop to the outward ones, and save intermediate results in temp variables? Although it is most likely already done by the compiler.
     
  7. Nov 6, 2009 #6
    This looks like integration over two spheres. Can you reformulate the problem to eliminate some integrations? For example, if relevant degrees of freedom only depend on the angle between two vectors, you can get away with only one loop instead of 4.

    And you shouldn't integrate to 360, you should integrate to 360-step.

    Also consider doing this in C.
     
    Last edited: Nov 6, 2009
  8. Nov 6, 2009 #7

    rcgldr

    User Avatar
    Homework Helper

    Your program probably ran out of registers to use, and perhaps the compiler didin't prioritize inner variable over outer variables. It should help to declare those loop counters as integer, which use a different set of registers.
     
  9. Nov 6, 2009 #8
    > Also consider doing this in C.

    Why? C is not intrinsically any faster.
     
  10. Nov 6, 2009 #9

    minger

    User Avatar
    Science Advisor

    Right C is not faster. As a first thought, make sure that your DO loops are in the right "order". You want your inside loop to be the first index in your array, e.g.
    Code (Text):

    DO k=1,kmax
     DO j=1,jmax
      DO i=1,imax
        array(i,j,k) = something
      END DO
     END DO
    END DO
     
    Is much faster than having the DO loops in the opposite order.

    Also, as a test (really not sure if it will be faster or not), you can try using the SUM array intrinsic, e.g.
    Code (Text):

    DO
    DO
    DO
    DO
     elem = whatever
    END DO
    END DO
    END DO
    END DO

    tot = SUM(elem)
     
     
  11. Nov 6, 2009 #10
    Appreciate all the responses. The portion of code in question was just one step in a very lengthy problem. I am quite pleased with the improvements you all have helped me achieve - over 50% faster on the one subroutine which was called over 12,000 times. The changes have moved my dissertation one step closer to completion, thanks!

    I'll respond to each suggestion below:

    1. I had already tried doing a "sum" outside the nest with no noticeable improvement in calculation time.
    2. There is no way around four integrations, the integrations are not spheres but incident and reflected angles.
    3. I sucked it up further and put the code in-line for comparison. It took 4m 22s just as it did with a subroutine call. I believe the compiler optimizations I chose may include making the subroutine in-line anyway. I returned to using the subroutine.
    4. Great catch on the 360-step. That was a mistake that I did not catch. With that change, I am down to 4m 13s...and what's better, the calculation is now right!
    5. Not sure what benefit "C" would have in this project. That change is too significant to test.
     
  12. Nov 6, 2009 #11
    Does the system possess rotational symmetry? If it does, one integration over 0 to 360 can be eliminated.

    Being a somewhat lower-level language, C is considerably faster than Fortran (up to 2-3 times, depending on task) and that makes it more suited for heavy numerical programming.

    http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=gcc&lang2=ifc&box=1 [Broken]
     
    Last edited by a moderator: May 4, 2017
  13. Nov 6, 2009 #12

    rcgldr

    User Avatar
    Homework Helper

    It really depends on the compilier. Comparing a bad Fortran compiler versus a good C compiler isn't fair. In the case of some Cray supercomputers (like the nearly extinct X1 X1E series vector processing machines), the Fortran compliler is faster than C, partly because some vendor specific extensions were made to the Fortan language used on the Cray, and partly because that particular Fortran compiler optimizes very well on the Cray supercomputer. Newer Cray systems are supposed to combine Intel or AMD cpu's with specialized vector math units, but I don't know how many of these have been made.

    I don't know what options there are in terms of Fortran compilers for PC based systems, and if which of these, if any, do a really good job of optimizing code, or if in this case, the required floating point calcuations simply can't be optimized beyond some basic level.
     
    Last edited: Nov 6, 2009
  14. Nov 9, 2009 #13
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: FORTRAN ?: Trying to bypass significant slowdown
  1. Infinity in fortran (Replies: 5)

  2. Linking Fortran (Replies: 5)

Loading...