- #1
davidfur
- 18
- 2
- TL;DR Summary
- I'm trying to parallelize a simple DO loop for the first time, but without success.
Need some basic help with minimal code sample.
Hey guys,
I've started to read some OpenMP programming and now I'm trying to parallelize small part of a fortran code.
The first thing I would like to do is to parallelize the innermost DO loop. It loops through the number of particles (na) and calculates
the distance between some point in 3D space (pos) and the particle's position (pos). At the end, the particle closest to the point should be identified.
When I compile the program and run it on 1 thread, the execution time is 13seconds (for the whole program). Then, running on 10 threads the execution time jumps to over a minute. Clearly, I have messed up somewhere, but my current understanding is still lacking...
Specifically, I would like to see the expected speed-up, and also make sure that all threads are aware of the most up-to-date shortestDis to compare to.
Can anybody guide me through this?
I've started to read some OpenMP programming and now I'm trying to parallelize small part of a fortran code.
The first thing I would like to do is to parallelize the innermost DO loop. It loops through the number of particles (na) and calculates
the distance between some point in 3D space (pos) and the particle's position (pos). At the end, the particle closest to the point should be identified.
Fortran:
!$omp parallel do private(i1,imol2,atomDis) default(shared)
do i1=1,na
imol1=iag(aid,3+mbond)
imol2=iag(i1,3+mbond)
!write(*,*) 'atom ',aid,' belongs to mol: ',imol1
!write(*,*) 'atom ',i1,' belongs to mol: ',imol2
! perform analysis only on same molecule
if (imol1 .NE. imol2) then
!write(*,*) 'cycle at atom ',i1
cycle
endif
call dista3(i1,pos,atomDis,dx,dy,dz)
! pos is already in angstrom.
! convert atomDis back to bohr.
atomDis=atomDis/bohr2ang
!write(*,*) 'atomDis=',atomDis
if (atomDis < shortestDis) then
!write(*,*) 'closest atom is: ',i1
closestAtm = i1
shortestDis = atomDis
endif
enddo
!$omp end parallel do
When I compile the program and run it on 1 thread, the execution time is 13seconds (for the whole program). Then, running on 10 threads the execution time jumps to over a minute. Clearly, I have messed up somewhere, but my current understanding is still lacking...
Specifically, I would like to see the expected speed-up, and also make sure that all threads are aware of the most up-to-date shortestDis to compare to.
Can anybody guide me through this?