Parallel DO loop with OpenMP (fortran)

In summary: dy=(c(n1,2)-pos(2)+k2*tm11) dz=(c(n1,3)-pos(3)+k3*tm11) dismin=dismin+k1*dx+k2*dy+k3*dz endif ! end if ! end if ! end do!
  • #1
davidfur
18
2
TL;DR Summary
I'm trying to parallelize a simple DO loop for the first time, but without success.
Need some basic help with minimal code sample.
Hey guys,
I've started to read some OpenMP programming and now I'm trying to parallelize small part of a fortran code.
The first thing I would like to do is to parallelize the innermost DO loop. It loops through the number of particles (na) and calculates
the distance between some point in 3D space (pos) and the particle's position (pos). At the end, the particle closest to the point should be identified.

Fortran:
!$omp parallel do private(i1,imol2,atomDis) default(shared)
            do i1=1,na
              imol1=iag(aid,3+mbond)
              imol2=iag(i1,3+mbond)
              !write(*,*) 'atom ',aid,' belongs to mol: ',imol1
              !write(*,*) 'atom ',i1,'  belongs to mol: ',imol2
              ! perform analysis only on same molecule
              if (imol1 .NE. imol2) then
                  !write(*,*) 'cycle at atom ',i1
                 cycle
              endif
              call dista3(i1,pos,atomDis,dx,dy,dz)
              ! pos is already in angstrom.
              ! convert atomDis back to bohr.
              atomDis=atomDis/bohr2ang
              !write(*,*) 'atomDis=',atomDis
              if (atomDis < shortestDis) then
                 !write(*,*) 'closest atom is: ',i1
                 closestAtm = i1
                 shortestDis = atomDis
              endif
            enddo
!$omp end parallel do

When I compile the program and run it on 1 thread, the execution time is 13seconds (for the whole program). Then, running on 10 threads the execution time jumps to over a minute. Clearly, I have messed up somewhere, but my current understanding is still lacking...

Specifically, I would like to see the expected speed-up, and also make sure that all threads are aware of the most up-to-date shortestDis to compare to.

Can anybody guide me through this?
 
Technology news on Phys.org
  • #2
I'm actually surprised the time changed at all. You can't call dista3 inside the loop you are trying to parallelize.
 
  • #3
Why not, Vanadium?
 
  • #4
Old versions didn't allow this at all, but the basic problem is that that subroutine has a huge number of restrictions placed on it - you do not generate N identical copes of the subroutine each with its own data space. The odds are quiet high that this will break (which is why it wasn't allowed earlier).

And to add, if you knew all this, I would have expected you to post the source of the called routine so we can discuss potential race conditions and such.
 
  • #5
Thanks, but I don't think I understand the problem.

Here is the dista3 subroutine, which calculates the distance:
Fortran:
    subroutine dista3 (n1,pos,dista,dx,dy,dz)
    include 'cbka.blk'
    double precision :: pos(3)
!*********************************************************************
!     Determine distance between atom n1 and point pos               *
!*********************************************************************
    if (iortho == 1) then
        dx=c(n1,1)-pos(1)
        dy=c(n1,2)-pos(2)
        dz=c(n1,3)-pos(3)
        dx=dx-anint(dx/aaxh)*aaxh
        dy=dy-anint(dy/baxh)*baxh
        dz=dz-anint(dz/caxh)*caxh
        dista=dsqrt(dx*dx+dy*dy+dz*dz)
    else
        dismin=1d+30
        do 10 k1=-1,1
            dx=(c(n1,1)-pos(1)+k1*tm11)
            do 10 k2=-1,1
                dy=(c(n1,2)-pos(2)+k1*tm21+k2*tm22)
                do 10 k3=-1,1
                    dz=(c(n1,3)-pos(3)+k1*tm31+k2*tm32+k3*tm33)
                    dis=dsqrt(dx*dx+dy*dy+dz*dz)
                    if ((dis < dismin)) then
                        dxmin=dx
                        dymin=dy
                        dzmin=dz
                        dismin=dis
                    end if
        10 END DO
        dista=dismin
        dx=dxmin
        dy=dymin
        dz=dzmin
    end if
    return
    end subroutine dista3

In fact, a very similar thing was done in a reference openmp fortran code I've found on the web. It's written by Bill Magro, Kuck and Associates, Inc. (KAI), 1998, which is the chief HPC technologist in Intel.

Fortran:
! The computation of forces and energies is fully parallel.
!$omp  parallel do
!$omp& default(shared)
!$omp& private(i,j,k,rij,d)
!$omp& reduction(+ : pot, kin)
      do i=1,np
        ! compute potential energy and forces
        f(1:nd,i) = 0.0
        do j=1,np
             if (i .ne. j) then
               call dist(nd,box,pos(1,i),pos(1,j),rij,d)
               ! attribute half of the potential energy to particle 'j'
               pot = pot + 0.5*v(d)
               do k=1,nd
                 f(k,i) = f(k,i) - rij(k)*dv(d)/d
               enddo
             endif
        enddo
        ! compute kinetic energy
        kin = kin + dotr8(nd,vel(1,i),vel(1,i))
      enddo
!$omp  end parallel do

And his dist subroutine is the following:

Fortran:
! Compute the displacement vector (and its norm) between two particles.
      subroutine dist(nd,box,r1,r2,dr,d)
      implicit none

      integer nd
      real*8 box(nd)
      real*8 r1(nd)
      real*8 r2(nd)
      real*8 dr(nd)
      real*8 d

      integer i

      d = 0.0
      do i=1,nd
        dr(i) = r1(i) - r2(i)
        d = d + dr(i)**2.
      enddo
      d = sqrt(d)

      return
      end

So, I don't see why my subroutine should cause any special problems. I'm just a beginner with OpenMP though so please bear with me
 
  • #6
Thread one sets dx = 10.
Then thread two sets dx = 11.
Then thread one updates dx by subtracting off the anint stuff. It's subtracting off 11, not 10.
Then thread two updates dx by subtracting off the anint stuff. It's subtracting off not 11, not 10, but whatever you got in the previous step.
Then thread three comes along and sets dx to something else, and so on.

There is a huge mess.

Your sample code from Magro solves this with setting variables to private. Your code does not.

Hence the mess.
 
  • Like
Likes sysprog and davidfur
  • #7
If you have complex code and not tight loops, why are you using OpenMP and not pthreads?
 
  • #8
I know even less about pthreads...
 
  • #9
I just realized that not all Fortran compilers support pthreads. XLF for sure does. However, pthreads fits what you are trying to do better than OpenMP.

If you insist on OpenMP, you have two icky options. One is to move all the subroutines into the main block of code. Then you can protect the variables. But your code will get very spaghetti-ish. You may even find that you need to duplicate a lot of code.

The other possibility is change the calling parameters so that your subroutine declares no variables at all. Not a one. Then you can protect what needs to be private.

What you will want to do and cannot is declare a variable private inside the subroutine. That's not in the right spot - it needs to be part of the omp parallel instruction block.
 

FAQ: Parallel DO loop with OpenMP (fortran)

What is a Parallel DO loop with OpenMP?

A Parallel DO loop with OpenMP is a method for parallelizing code in Fortran. It allows for multiple threads to execute the same loop simultaneously, improving the performance and efficiency of the code.

How does a Parallel DO loop with OpenMP work?

A Parallel DO loop with OpenMP works by dividing the iterations of a DO loop among multiple threads. Each thread is assigned a portion of the loop to execute, allowing for multiple iterations to be processed simultaneously.

What are the benefits of using a Parallel DO loop with OpenMP?

There are several benefits to using a Parallel DO loop with OpenMP, including improved performance and efficiency of the code, reduced execution time, and the ability to take advantage of multiple CPU cores on a single machine.

Are there any limitations to using a Parallel DO loop with OpenMP?

One potential limitation of using a Parallel DO loop with OpenMP is that it may not be suitable for all types of algorithms. For example, algorithms with large amounts of branching or dependencies may not see a significant improvement in performance from parallelization.

How do I implement a Parallel DO loop with OpenMP in my Fortran code?

To implement a Parallel DO loop with OpenMP in Fortran, you will need to use the appropriate OpenMP directives and clauses within your code. These can be placed around the DO loop that you want to parallelize, and they will control the number of threads and how the iterations are divided among them.

Back
Top