I ran this on a MAC with same processor. Percentage Gains are similar but MAC/UNIX speed is much better overall.
Results on MACOS:
+ export OMP_NUM_THREADS=1
+ OMP_NUM_THREADS=1
+ ./MP_Test
MP time 1.125 seconds
xM( 376) 0.995
xM( 376) 1.605
NO MP time 1.125 seconds
x( 376) 0.995...
I have written some code that shows gains of OpenMP. This code shows gains and verifies output by comparison. Using my old laptop (2 core i5) I get at best 3X resolution. Probably better on more modern machines. I have attached code and msys2 compile shell...
I figured it out. The stack size is fixed in Windows. You can increase the value by adding a line at compile time.
Example: gfortran -Wl,--stack,16000000 -O3 -fopenmp -static test.f -o Test
the stack size should be number of array variables X 8. So if you had x(i) 1000000 and y(i) 1000000
then...
OK, I thought you had seen my timing post code using system_clock. Anyway I wrote in some code for a parallel loop and the speed increase was verified. It is a little better than twice the speed. There is a variable printed Z(100) to confirm no errors in the parallel portion. I also compiled...
I understood. I think you took me too literally on the function. I clarified when I said "Each iteration requires knowledge of the previous one" The counter I'm using should work because it is looking at the system clock so the code should not interfere. I think you are saying it may look at...
I think it is because of what I commented earlier. If the loop is not a function of i, it can't possibly run in parallel. In my loop example, parallel would make a mess. Each iteration requires knowledge of the previous one. I was going to re-write something in the loop to make it practical. The...
I am with you on that. I typically experiment with my code to maximize efficiency. Back when I first started writing code (1974) it was required. I have revisited my first posted GFORTRAN and came up with a better way of measuring time. It looks like I can improve time better than 2 to 1...
I am OK with it. I guess I am spoiled because when I edited some NEC code (Numeric Electromagnetic) to use OpenBlas optimized libs that I compiled for multiple architectures, I got a 6X improvement.
One question, When I use OpenMP statements in my GFORTRAN code, it has to be in column 1 or it is...
I ran your program on my laptop. The array was too large for this computer and it crashed. I edited to decrease N to the point where it would run. This is a duo core and 4 threads was max it would do. At best it was about 61 percent of non-MP on my laptop.
Results from batch changing number of...
Also here is how I looked at thread times
tbegin = omp_get_wtime()
!$omp parallel do
do i = 1, N
c code here
end do
!$omp end parallel do
wtime = omp_get_wtime() - tbegin
print "( 'Computing MP', i10, ' loops ', i2,' threads took '...
You are correct in that the time was total for all threads. My original program was fine, but when I did time calls using openMP libs, it showed each thread was faster. Some loops can't be parallelized. If the loop result is not a function of the loop count, there is no way to split the loops...
I am not sure I understand why OpenMP doesn't work well on loops. The reason I am experimenting is I have some code that I want to parallelize that does most of it's factoring and iteration in loops. (about 370 loops)
I have already edited the code to use optimized LAPACK on some subroutines...