Fortran Why Is My OpenMP FORTRAN Program Slower in Parallel Than in Single Thread?

  • Thread starter Thread starter jelanier
  • Start date Start date
  • Tags Tags
    Beginner Fortran
Click For Summary
OpenMP parallelization can lead to slower execution times in FORTRAN programs due to overhead associated with managing multiple threads. Users reported that their parallel implementations resulted in longer execution times compared to single-threaded runs, particularly when the workload per thread is minimal. It was suggested that the test program should involve operations on arrays rather than simple loops to better utilize parallel processing capabilities. Additionally, the effectiveness of OpenMP can vary significantly based on the nature of the code and the hardware used, emphasizing the importance of optimizing code for parallel execution. Efficient multithreading requires experience and careful consideration of workload distribution to achieve desired performance improvements.
  • #31
jelanier said:
. If the loop is not a function of i, it can't possibly run in parallel.

Close. If you have x = x * i, it is a function of i, but still can't run in parallel.

Probably the easiest way to imagine it is to think about giving half the loop to one thread and half to another. Would you get the same answer? No matter how you split it? I usually ask myself "if I had N CPUs, one for each of the N elements of the loop, would I get the right answer?"

jelanier said:
The purpose of this code was to count time more reliably.

Well, you aren't doing that exactly. You're measuring something, that's for sure, but it may be totally unrelated to any real code. In particular, the lack of use of x at the end means the compiler might optimize away part or all of the code involving x. Including the code inside the PARALLEL clause. That said, using wall clock time is a good idea: that's what you really care about, after all.
 
Technology news on Phys.org
  • #32
I understood. I think you took me too literally on the function. I clarified when I said "Each iteration requires knowledge of the previous one" The counter I'm using should work because it is looking at the system clock so the code should not interfere. I think you are saying it may look at system clock while a code process is still running. One of the timing procedures I used in another test shows timing of each thread and when multiplied by number of threads results were close to the system clock result. Can you give me an example of wall clock time so I can compare? I thought system clock was wall clock. Quote from another forum " system_clock reports "wall time" or elapsed time. ". Is this not correct? I would think it to be conservative because the computer is doing things other than the code I'm testing.

see https://gcc.gnu.org/onlinedocs/gfortran/SYSTEM_005fCLOCK.html

Thanks
 
Last edited:
  • #33
You are using wall clock time now.
 
  • #34
OK, I thought you had seen my timing post code using system_clock. Anyway I wrote in some code for a parallel loop and the speed increase was verified. It is a little better than twice the speed. There is a variable printed Z(100) to confirm no errors in the parallel portion. I also compiled with and without optimization to test results.

See results:

C:\OpenMP\Test Timing>set OMP_NUM_THREADS=1

C:\OpenMP\Test Timing>Test_T
MP time 0.961 seconds
z100 WI MP 406.86585116386414
NO MP time 0.938 seconds
z100 NO MP 406.86585116386414
Percentage Time 102.500%

C:\OpenMP\Test Timing>set OMP_NUM_THREADS=2

C:\OpenMP\Test Timing>Test_T
MP time 0.500 seconds
z100 WI MP 406.86585116386414
NO MP time 1.000 seconds
z100 NO MP 406.86585116386414
Percentage Time 50.000%

C:\OpenMP\Test Timing>set OMP_NUM_THREADS=4

C:\OpenMP\Test Timing>Test_T
MP time 0.391 seconds
z100 WI MP 406.86585116386414
NO MP time 0.949 seconds
z100 NO MP 406.86585116386414
Percentage Time 41.152%

C:\OpenMP\Test Timing>set OMP_NUM_THREADS=8

C:\OpenMP\Test Timing>Test_T
MP time 0.422 seconds
z100 WI MP 406.86585116386414
NO MP time 0.953 seconds
z100 NO MP 406.86585116386414
Percentage Time 44.262%

C:\OpenMP\Test Timing>pause
Press any key to continue . . .

Thanks again for your help.
 
  • #35
Vanadium 50 said:
That's very surprising. Each array is only 8 MB in size. Wonder why that is.

I figured it out. The stack size is fixed in Windows. You can increase the value by adding a line at compile time.
Example: gfortran -Wl,--stack,16000000 -O3 -fopenmp -static test.f -o Test

the stack size should be number of array variables X 8. So if you had x(i) 1000000 and y(i) 1000000
then stack would be 1000000 X 2 X 8=16000000 as minimum (plus other overhead)

Jim
 
  • #36
I have written some code that shows gains of OpenMP. This code shows gains and verifies output by comparison. Using my old laptop (2 core i5) I get at best 3X resolution. Probably better on more modern machines. I have attached code and msys2 compile shell.

http://www.chemroc.com/MISC/OpenMP/MP_Test.f

http://www.chemroc.com/MISC/OpenMP/MP_Test.sh

*************************************************************

This is output using my laptop:

***********************************************************
C:\OpenMP\Test Timing - random>set OMP_NUM_THREADS=1

C:\OpenMP\Test Timing - random>MP_Test
MP time 2.125 seconds
xM( 540) 0.480
xM( 540) 1.127
NO MP time 2.062 seconds
x( 540) 0.480
x( 540) 1.127
Percentage Time 103.030%

C:\OpenMP\Test Timing - random>set OMP_NUM_THREADS=2

C:\OpenMP\Test Timing - random>MP_Test
MP time 1.219 seconds
xM( 109) 0.821
xM( 109) 1.469
NO MP time 2.062 seconds
x( 109) 0.821
x( 109) 1.469
Percentage Time 59.091%

C:\OpenMP\Test Timing - random>set OMP_NUM_THREADS=4

C:\OpenMP\Test Timing - random>MP_Test
MP time 0.750 seconds
xM( 869) 0.151
xM( 869) 0.696
NO MP time 2.094 seconds
x( 869) 0.151
x( 869) 0.696
Percentage Time 35.821%

C:\OpenMP\Test Timing - random>set OMP_NUM_THREADS=8

C:\OpenMP\Test Timing - random>MP_Test
MP time 0.750 seconds
xM( 384) 0.952
xM( 384) 1.669
NO MP time 2.094 seconds
x( 384) 0.952
x( 384) 1.669
Percentage Time 35.821%

C:\OpenMP\Test Timing - random>set OMP_NUM_THREADS=16

C:\OpenMP\Test Timing - random>MP_Test
MP time 0.750 seconds
xM( 186) 0.869
xM( 186) 1.551
NO MP time 2.031 seconds
x( 186) 0.869
x( 186) 1.551
Percentage Time 36.923%

*******************************************************

Later,

Jim
 
  • #37
So you get a 2.8x improvement on a 2C/4T machine? You should be pretty happy about that. :smile:
 
  • Like
Likes pbuk
  • #38
Vanadium 50 said:
So you get a 2.8x improvement on a 2C/4T machine? You should be pretty happy about that. :smile:
I ran this on a MAC with same processor. Percentage Gains are similar but MAC/UNIX speed is much better overall.

Results on MACOS:

+ export OMP_NUM_THREADS=1
+ OMP_NUM_THREADS=1
+ ./MP_Test
MP time 1.125 seconds
xM( 376) 0.995
xM( 376) 1.605
NO MP time 1.125 seconds
x( 376) 0.995
x( 376) 1.605
Percentage Time 100.000%
+ export OMP_NUM_THREADS=2
+ OMP_NUM_THREADS=2
+ ./MP_Test
MP time 0.625 seconds
xM( 627) 0.348
xM( 627) 0.774
NO MP time 1.125 seconds
x( 627) 0.348
x( 627) 0.774
Percentage Time 55.556%
+ export OMP_NUM_THREADS=4
+ OMP_NUM_THREADS=4
+ ./MP_Test
MP time 0.375 seconds
xM( 677) 0.929
xM( 677) 1.739
NO MP time 1.125 seconds
x( 677) 0.929
x( 677) 1.739
Percentage Time 33.333%
+ export OMP_NUM_THREADS=8
+ OMP_NUM_THREADS=8
+ ./MP_Test
MP time 0.375 seconds
xM( 914) 0.882
xM( 914) 1.578
NO MP time 1.250 seconds
x( 914) 0.882
x( 914) 1.578
Percentage Time 30.000%
+ export OMP_NUM_THREADS=16
+ OMP_NUM_THREADS=16
+ ./MP_Test
MP time 0.375 seconds
xM( 423) 0.360
xM( 423) 1.078
NO MP time 1.125 seconds
x( 423) 0.360
x( 423) 1.078
Percentage Time 33.333%
+ export OMP_NUM_THREADS=32
+ OMP_NUM_THREADS=32
+ ./MP_Test
MP time 0.375 seconds
xM( 832) 0.834
xM( 832) 1.575
NO MP time 1.125 seconds
x( 832) 0.834
x( 832) 1.575
Percentage Time 33.333%

[Process completed]
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 8 ·
Replies
8
Views
8K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K