Parallel DO loop with OpenMP (fortran)

davidfur · Jan 18, 2021

Hey guys,
I've started to read some OpenMP programming and now I'm trying to parallelize small part of a fortran code.
The first thing I would like to do is to parallelize the innermost DO loop. It loops through the number of particles (na) and calculates
the distance between some point in 3D space (pos) and the particle's position (pos). At the end, the particle closest to the point should be identified.

Fortran:

!$omp parallel do private(i1,imol2,atomDis) default(shared)
            do i1=1,na
              imol1=iag(aid,3+mbond)
              imol2=iag(i1,3+mbond)
              !write(*,*) 'atom ',aid,' belongs to mol: ',imol1
              !write(*,*) 'atom ',i1,'  belongs to mol: ',imol2
              ! perform analysis only on same molecule
              if (imol1 .NE. imol2) then
                  !write(*,*) 'cycle at atom ',i1
                 cycle
              endif
              call dista3(i1,pos,atomDis,dx,dy,dz)
              ! pos is already in angstrom.
              ! convert atomDis back to bohr.
              atomDis=atomDis/bohr2ang
              !write(*,*) 'atomDis=',atomDis
              if (atomDis < shortestDis) then
                 !write(*,*) 'closest atom is: ',i1
                 closestAtm = i1
                 shortestDis = atomDis
              endif
            enddo
!$omp end parallel do

When I compile the program and run it on 1 thread, the execution time is 13seconds (for the whole program). Then, running on 10 threads the execution time jumps to over a minute. Clearly, I have messed up somewhere, but my current understanding is still lacking...

Specifically, I would like to see the expected speed-up, and also make sure that all threads are aware of the most up-to-date shortestDis to compare to.

Can anybody guide me through this?

Vanadium 50 · Jan 18, 2021

I'm actually surprised the time changed at all. You can't call dista3 inside the loop you are trying to parallelize.

davidfur · Jan 18, 2021

Why not, Vanadium?

Vanadium 50 · Jan 18, 2021

Old versions didn't allow this at all, but the basic problem is that that subroutine has a huge number of restrictions placed on it - you do not generate N identical copes of the subroutine each with its own data space. The odds are quiet high that this will break (which is why it wasn't allowed earlier).

And to add, if you knew all this, I would have expected you to post the source of the called routine so we can discuss potential race conditions and such.

davidfur · Jan 18, 2021

Thanks, but I don't think I understand the problem.

Here is the dista3 subroutine, which calculates the distance:

Fortran:

    subroutine dista3 (n1,pos,dista,dx,dy,dz)
    include 'cbka.blk'
    double precision :: pos(3)
!*********************************************************************
!     Determine distance between atom n1 and point pos               *
!*********************************************************************
    if (iortho == 1) then
        dx=c(n1,1)-pos(1)
        dy=c(n1,2)-pos(2)
        dz=c(n1,3)-pos(3)
        dx=dx-anint(dx/aaxh)*aaxh
        dy=dy-anint(dy/baxh)*baxh
        dz=dz-anint(dz/caxh)*caxh
        dista=dsqrt(dx*dx+dy*dy+dz*dz)
    else
        dismin=1d+30
        do 10 k1=-1,1
            dx=(c(n1,1)-pos(1)+k1*tm11)
            do 10 k2=-1,1
                dy=(c(n1,2)-pos(2)+k1*tm21+k2*tm22)
                do 10 k3=-1,1
                    dz=(c(n1,3)-pos(3)+k1*tm31+k2*tm32+k3*tm33)
                    dis=dsqrt(dx*dx+dy*dy+dz*dz)
                    if ((dis < dismin)) then
                        dxmin=dx
                        dymin=dy
                        dzmin=dz
                        dismin=dis
                    end if
        10 END DO
        dista=dismin
        dx=dxmin
        dy=dymin
        dz=dzmin
    end if
    return
    end subroutine dista3

In fact, a very similar thing was done in a reference openmp fortran code I've found on the web. It's written by Bill Magro, Kuck and Associates, Inc. (KAI), 1998, which is the chief HPC technologist in Intel.

Fortran:

! The computation of forces and energies is fully parallel.
!$omp  parallel do
!$omp& default(shared)
!$omp& private(i,j,k,rij,d)
!$omp& reduction(+ : pot, kin)
      do i=1,np
        ! compute potential energy and forces
        f(1:nd,i) = 0.0
        do j=1,np
             if (i .ne. j) then
               call dist(nd,box,pos(1,i),pos(1,j),rij,d)
               ! attribute half of the potential energy to particle 'j'
               pot = pot + 0.5*v(d)
               do k=1,nd
                 f(k,i) = f(k,i) - rij(k)*dv(d)/d
               enddo
             endif
        enddo
        ! compute kinetic energy
        kin = kin + dotr8(nd,vel(1,i),vel(1,i))
      enddo
!$omp  end parallel do

And his dist subroutine is the following:

Fortran:

! Compute the displacement vector (and its norm) between two particles.
      subroutine dist(nd,box,r1,r2,dr,d)
      implicit none

      integer nd
      real*8 box(nd)
      real*8 r1(nd)
      real*8 r2(nd)
      real*8 dr(nd)
      real*8 d

      integer i

      d = 0.0
      do i=1,nd
        dr(i) = r1(i) - r2(i)
        d = d + dr(i)**2.
      enddo
      d = sqrt(d)

      return
      end

So, I don't see why my subroutine should cause any special problems. I'm just a beginner with OpenMP though so please bear with me

Vanadium 50 · Jan 18, 2021

Thread one sets dx = 10.
Then thread two sets dx = 11.
Then thread one updates dx by subtracting off the anint stuff. It's subtracting off 11, not 10.
Then thread two updates dx by subtracting off the anint stuff. It's subtracting off not 11, not 10, but whatever you got in the previous step.
Then thread three comes along and sets dx to something else, and so on.

There is a huge mess.

Your sample code from Magro solves this with setting variables to private. Your code does not.

Hence the mess.

Vanadium 50 · Jan 18, 2021

If you have complex code and not tight loops, why are you using OpenMP and not pthreads?

davidfur · Jan 18, 2021

I know even less about pthreads...

Vanadium 50 · Jan 18, 2021

I just realized that not all Fortran compilers support pthreads. XLF for sure does. However, pthreads fits what you are trying to do better than OpenMP.

If you insist on OpenMP, you have two icky options. One is to move all the subroutines into the main block of code. Then you can protect the variables. But your code will get very spaghetti-ish. You may even find that you need to duplicate a lot of code.

The other possibility is change the calling parameters so that your subroutine declares no variables at all. Not a one. Then you can protect what needs to be private.

What you will want to do and cannot is declare a variable private inside the subroutine. That's not in the right spot - it needs to be part of the omp parallel instruction block.

Parallel DO loop with OpenMP (fortran)

Similar threads

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect