Vanadium 50 said:
Some tips:
(1) Don't get greedy. A 2C/4T CPU is not going to give you a factor of 4, and depending on the code might not give you a factor of 2. You can spend a lot of effort trying to get a small incremental improvement.
(1B) If you want to use GPUs, OpenMP is not very efficient at it. You are usually better using something else.
(2) Profile, profile, profile. You need to know where the code is spending its time. That may not be where you think it is spending its time. Find that spot and try and parallelize it. If the code is serial, there's no point in throwing OpenMP at it. Once you have sped it up, profile the code again and see if you have a new bottleneck. Repeat until exhausted.
(3) Compile, compile, compile. Debugging with OpenMP is hard. Make sure you have no introduced any bugs at every step. If you do 8 hours of work and now the answer is wrong, good luck finding exactly where. One missing barrier can make a mess - and good luck finding it. Similarly, save known good pieces of code. You may need to revert.
(4) Consider pthreads over OpenMP. OpenMP parallelizes small chunks of code, ideally with no or few branches. pthreads casn parallelize large chunks of code with lots of branches. Depending on what you are doing, one may be much better than the other.
I am with you on that. I typically experiment with my code to maximize efficiency. Back when I first started writing code (1974) it was required. I have revisited my first posted GFORTRAN and came up with a better way of measuring time. It looks like I can improve time better than 2 to 1.
[CODE lang="fortran" title="hel"] program hel
integer i,j,ct_0,ct_1,ct_rate,ct_max
real*8 :: time_init,time_final,elapsed_t,elapsed_tMP,x
j=250000000 ! SET NUMBER OF LOOPS
*************************USE MP*****************************************
x=1.0000001 !SET INITIAL X
c Starting Time
call system_clock(ct_0,ct_rate,ct_max)
time_init=ct_0*1.0/ct_rate
!$OMP PARALLEL DO
C!$OMP PARALLEL
C!$OMP DO
do i=1,J
x=x**2
x=((x-1)/2)+1
end do
C!$OMP END DO
C!$OMP END PARALLEL
!$OMP END PARALLEL DO
c Ending Time
call system_clock(ct_1,ct_rate,ct_max)
time_final=ct_1*1.0/ct_rate
elapsed_t_MP=time_final-time_init
WRITE(*,10) elapsed_t_MP
10 format(" MP time",F8.3," seconds")********************END USE MP****************************************** ********************START NO MP*****************************************
x=1.0000001 !SET INITIAL X
c Starting Time
call system_clock(ct_0,ct_rate,ct_max)
time_init=ct_0*1.0/ct_rate
do i=1,J
x=x**2
x=((x-1)/2)+1
end do
c Ending Time
call system_clock(ct_1,ct_rate,ct_max)
time_final=ct_1*1.0/ct_rate
elapsed_t=time_final-time_init
WRITE(*,11) elapsed_t
11 format("No MP time",F8.3," seconds")
********************END NO MP*******************************************
WRITE(*,12) elapsed_t_MP*100/elapsed_t
12 format("Percentage Time",F8.3)
end program
[/CODE]