Parallel DO loop with OpenMP (fortran)

dy=(c(n1,2)-pos(2)+k2*tm11) dz=(c(n1,3)-pos(3)+k3*tm11) dismin=dismin+k1*dx+k2*dy+k3*dz endif ! end if ! end if ! end do!f
  • #1
18
2
TL;DR Summary
I'm trying to parallelize a simple DO loop for the first time, but without success.
Need some basic help with minimal code sample.
Hey guys,
I've started to read some OpenMP programming and now I'm trying to parallelize small part of a fortran code.
The first thing I would like to do is to parallelize the innermost DO loop. It loops through the number of particles (na) and calculates
the distance between some point in 3D space (pos) and the particle's position (pos). At the end, the particle closest to the point should be identified.

Fortran:
!$omp parallel do private(i1,imol2,atomDis) default(shared)
            do i1=1,na
              imol1=iag(aid,3+mbond)
              imol2=iag(i1,3+mbond)
              !write(*,*) 'atom ',aid,' belongs to mol: ',imol1
              !write(*,*) 'atom ',i1,'  belongs to mol: ',imol2
              ! perform analysis only on same molecule
              if (imol1 .NE. imol2) then
                  !write(*,*) 'cycle at atom ',i1
                 cycle
              endif
              call dista3(i1,pos,atomDis,dx,dy,dz)
              ! pos is already in angstrom.
              ! convert atomDis back to bohr.
              atomDis=atomDis/bohr2ang
              !write(*,*) 'atomDis=',atomDis
              if (atomDis < shortestDis) then
                 !write(*,*) 'closest atom is: ',i1
                 closestAtm = i1
                 shortestDis = atomDis
              endif
            enddo
!$omp end parallel do

When I compile the program and run it on 1 thread, the execution time is 13seconds (for the whole program). Then, running on 10 threads the execution time jumps to over a minute. Clearly, I have messed up somewhere, but my current understanding is still lacking...

Specifically, I would like to see the expected speed-up, and also make sure that all threads are aware of the most up-to-date shortestDis to compare to.

Can anybody guide me through this?
 
  • #2
I'm actually surprised the time changed at all. You can't call dista3 inside the loop you are trying to parallelize.
 
  • #3
Why not, Vanadium?
 
  • #4
Old versions didn't allow this at all, but the basic problem is that that subroutine has a huge number of restrictions placed on it - you do not generate N identical copes of the subroutine each with its own data space. The odds are quiet high that this will break (which is why it wasn't allowed earlier).

And to add, if you knew all this, I would have expected you to post the source of the called routine so we can discuss potential race conditions and such.
 
  • #5
Thanks, but I don't think I understand the problem.

Here is the dista3 subroutine, which calculates the distance:


Fortran:
    subroutine dista3 (n1,pos,dista,dx,dy,dz)
    include 'cbka.blk'
    double precision :: pos(3)
!*********************************************************************
!     Determine distance between atom n1 and point pos               *
!*********************************************************************
    if (iortho == 1) then
        dx=c(n1,1)-pos(1)
        dy=c(n1,2)-pos(2)
        dz=c(n1,3)-pos(3)
        dx=dx-anint(dx/aaxh)*aaxh
        dy=dy-anint(dy/baxh)*baxh
        dz=dz-anint(dz/caxh)*caxh
        dista=dsqrt(dx*dx+dy*dy+dz*dz)
    else
        dismin=1d+30
        do 10 k1=-1,1
            dx=(c(n1,1)-pos(1)+k1*tm11)
            do 10 k2=-1,1
                dy=(c(n1,2)-pos(2)+k1*tm21+k2*tm22)
                do 10 k3=-1,1
                    dz=(c(n1,3)-pos(3)+k1*tm31+k2*tm32+k3*tm33)
                    dis=dsqrt(dx*dx+dy*dy+dz*dz)
                    if ((dis < dismin)) then
                        dxmin=dx
                        dymin=dy
                        dzmin=dz
                        dismin=dis
                    end if
        10 END DO
        dista=dismin
        dx=dxmin
        dy=dymin
        dz=dzmin
    end if
    return
    end subroutine dista3

In fact, a very similar thing was done in a reference openmp fortran code I've found on the web. It's written by Bill Magro, Kuck and Associates, Inc. (KAI), 1998, which is the chief HPC technologist in Intel.

Fortran:
! The computation of forces and energies is fully parallel.
!$omp  parallel do
!$omp& default(shared)
!$omp& private(i,j,k,rij,d)
!$omp& reduction(+ : pot, kin)
      do i=1,np
        ! compute potential energy and forces
        f(1:nd,i) = 0.0
        do j=1,np
             if (i .ne. j) then
               call dist(nd,box,pos(1,i),pos(1,j),rij,d)
               ! attribute half of the potential energy to particle 'j'
               pot = pot + 0.5*v(d)
               do k=1,nd
                 f(k,i) = f(k,i) - rij(k)*dv(d)/d
               enddo
             endif
        enddo
        ! compute kinetic energy
        kin = kin + dotr8(nd,vel(1,i),vel(1,i))
      enddo
!$omp  end parallel do

And his dist subroutine is the following:

Fortran:
! Compute the displacement vector (and its norm) between two particles.
      subroutine dist(nd,box,r1,r2,dr,d)
      implicit none

      integer nd
      real*8 box(nd)
      real*8 r1(nd)
      real*8 r2(nd)
      real*8 dr(nd)
      real*8 d

      integer i

      d = 0.0
      do i=1,nd
        dr(i) = r1(i) - r2(i)
        d = d + dr(i)**2.
      enddo
      d = sqrt(d)

      return
      end

So, I don't see why my subroutine should cause any special problems. I'm just a beginner with OpenMP though so please bear with me
 
  • #6
Thread one sets dx = 10.
Then thread two sets dx = 11.
Then thread one updates dx by subtracting off the anint stuff. It's subtracting off 11, not 10.
Then thread two updates dx by subtracting off the anint stuff. It's subtracting off not 11, not 10, but whatever you got in the previous step.
Then thread three comes along and sets dx to something else, and so on.

There is a huge mess.

Your sample code from Magro solves this with setting variables to private. Your code does not.

Hence the mess.
 
  • Like
Likes sysprog and davidfur
  • #7
If you have complex code and not tight loops, why are you using OpenMP and not pthreads?
 
  • #8
I know even less about pthreads...
 
  • #9
I just realized that not all Fortran compilers support pthreads. XLF for sure does. However, pthreads fits what you are trying to do better than OpenMP.

If you insist on OpenMP, you have two icky options. One is to move all the subroutines into the main block of code. Then you can protect the variables. But your code will get very spaghetti-ish. You may even find that you need to duplicate a lot of code.

The other possibility is change the calling parameters so that your subroutine declares no variables at all. Not a one. Then you can protect what needs to be private.

What you will want to do and cannot is declare a variable private inside the subroutine. That's not in the right spot - it needs to be part of the omp parallel instruction block.
 

Suggested for: Parallel DO loop with OpenMP (fortran)

Replies
8
Views
549
2
Replies
37
Views
2K
2
Replies
60
Views
2K
Replies
2
Views
883
Replies
12
Views
736
Replies
2
Views
1K
Replies
2
Views
601
Back
Top