# MPI: Need help

1. Jan 11, 2019

### Telemachus

Hi. I'm trying to parallelize my code. I am new at MPI, I'm learning the basics.

I want to use a domain decomposition strategy in my code. I've been reading a bit, and wanted to use this subroutine to exchange points between neighbors. I've started by trying to modify a code presented by Gropp in his book. I'll give the entire code, which is publicly available at his webpage, and then I explain the modification I wanna do, and the problems I'm facing.

This code implements a domain decomposition strategy to solve the Poisson equation in 2D. It uses a 1D domain decomposition, which means that the domain is decomposed into "slices", the original array a(maxn,maxn) is decomposed in different processes which handles the subarrays a(maxn,s:e).

This is the code:

Code (Fortran):

!*******************************************************************
!   oned.f90 - a solution to the Poisson problem using Jacobi
!   interation on a 1-d decomposition
!
!   The size of the domain is read by processor 0 and broadcast to
!   all other processors.  The Jacobi iteration is run until the
!   change in successive elements is small or a maximum number of
!   iterations is reached.  The difference is printed out at each
!   step.
!*******************************************************************
program main
!
include "mpif.h"
integer maxn
parameter (maxn = 1024+2)
double precision  a(maxn,maxn), b(maxn,maxn), f(maxn,maxn)
integer nx, ny
integer myid, numprocs, ierr
integer comm1d, nbrbottom, nbrtop, s, e, it
double precision diff, diffnorm, dwork
double precision t1, t2
external diff

call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
!
if (myid .eq. 0) then
!
!         Get the size of the problem
!
!          print *, 'Enter nx'
nx = 1024
endif
call MPI_BCAST(nx,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
ny   = nx
!
! Get a new communicator for a decomposition of the domain
!
call MPI_CART_CREATE(MPI_COMM_WORLD, 1, numprocs, .false., &
.true., comm1d, ierr )
!
! Get my position in this communicator, and my neighbors
!
call MPI_COMM_RANK( comm1d, myid, ierr )
call MPI_Cart_shift( comm1d, 0,  1, nbrbottom, nbrtop, ierr )
!
! Compute the actual decomposition
!
call MPE_DECOMP1D( ny, numprocs, myid, s, e )
!
! Initialize the right-hand-side (f) and the initial solution guess (a)
!
call onedinit( a, b, f, nx, s, e )
!
! Actually do the computation.  Note the use of a collective operation to
! check for convergence, and a do-loop to bound the number of iterations.
!
call MPI_BARRIER( MPI_COMM_WORLD, ierr )
t1 = MPI_WTIME()
do it=1, 10000
!      do it=1, 2
call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop )
call sweep1d( a, f, nx, s, e, b )
call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop )
call sweep1d( b, f, nx, s, e, a )
dwork = diff( a, b, nx, s, e )
call MPI_Allreduce( dwork, diffnorm, 1, MPI_DOUBLE_PRECISION, &
MPI_SUM, comm1d, ierr )
if (diffnorm .lt. 1.0e-5) exit
!        if (myid .eq. 0) print *, 2*it, ' Difference is ', diffnorm
enddo
if (myid .eq. 0 .and. it .gt. 100) print *, 'Failed to converge'
t2 = MPI_WTIME()
if (myid .eq. 0) then
print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1, &
' secs '
print *,' Difference is ', diffnorm
endif
!
call MPI_FINALIZE(ierr)
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine exchng1(a, nx, s, e, comm1d, nbrbottom, nbrtop)
use mpi
integer nx, s, e, comm1d, nbrbottom, nbrtop
double precision a(0:nx+1,s-1:e+1)
integer ierr, req(4)
!
call MPI_IRECV(&
a(1,s-1), nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)
call MPI_IRECV(&
a(1,e+1), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(2), ierr)
call MPI_ISEND(&
a(1,e), nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(3), ierr)
call MPI_ISEND(&
a(1,s), nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, &
comm1d, req(4), ierr)
!
call MPI_WAITALL(4, req, MPI_STATUSES_IGNORE, ierr)
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
! This routine show how to determine the neighbors in a 2-d decomposition of
! the domain. This assumes that MPI_Cart_create has already been called
!
subroutine fnd2dnbrs( comm2d, &
nbrleft, nbrright, nbrtop, nbrbottom )
integer comm2d, nbrleft, nbrright, nbrtop, nbrbottom
!
integer ierr
!
call MPI_Cart_shift( comm2d, 0,  1, nbrleft,   nbrright, ierr )
call MPI_Cart_shift( comm2d, 1,  1, nbrbottom, nbrtop,   ierr )
!
return
end
!
subroutine fnd2ddecomp( comm2d, n, sx, ex, sy, ey )
integer comm2d
integer n, sx, ex, sy, ey
integer dims(2), coords(2), ierr
logical periods(2)
!
call MPI_Cart_get( comm2d, 2, dims, periods, coords, ierr )

call MPE_DECOMP1D( n, dims(1), coords(1), sx, ex )
call MPE_DECOMP1D( n, dims(2), coords(2), sy, ey )
!
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine onedinit( a, b, f, nx, s, e )
integer nx, s, e
double precision a(0:nx+1, s-1:e+1), b(0:nx+1, s-1:e+1), &
f(0:nx+1, s-1:e+1)
!
integer i, j
!
forall(j=s-1:e+1)
forall( i=0:nx+1)
a(i,j) = 0.0d0
b(i,j) = 0.0d0
f(i,j) = 0.0d0
end forall
end forall
!
!    Handle boundary conditions
!
forall(j=s:e)
a(0,j) = 1.0d0
b(0,j) = 1.0d0
a(nx+1,j) = 0.0d0
b(nx+1,j) = 0.0d0
end forall
if (s .eq. 1) then
forall( i=1:nx)
a(i,0) = 1.0d0
b(i,0) = 1.0d0
end forall
endif
!
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
!  This file contains a routine for producing a decomposition of a 1-d array
!  when given a number of processors.  It may be used in "direct" product
!  decomposition.  The values returned assume a "global" domain in [1:n]
!
subroutine MPE_DECOMP1D( n, numprocs, myid, s, e )
integer n, numprocs, myid, s, e
integer nlocal
integer deficit
!
nlocal  = n / numprocs
s         = myid * nlocal + 1
deficit = mod(n,numprocs)
s         = s + min(myid,deficit)
if (myid .lt. deficit) then
nlocal = nlocal + 1
endif
e = s + nlocal - 1
if (e .gt. n .or. myid .eq. numprocs-1) e = n
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
! Perform a Jacobi sweep for a 1-d decomposition.
! Sweep from a into b
!
subroutine sweep1d( a, f, nx, s, e, b )
integer nx, s, e
double precision a(0:nx+1,s-1:e+1), f(0:nx+1,s-1:e+1), &
b(0:nx+1,s-1:e+1)
!
integer i, j
double precision h
!
h = 1.0d0 / dble(nx+1)
do j=s, e
do i=1, nx
b(i,j) = 0.25 * (a(i-1,j)+a(i,j+1)+a(i,j-1)+a(i+1,j)) - &
h * h * f(i,j)
enddo
enddo
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
!  The rest of the 1-d program
!
double precision function diff( a, b, nx, s, e )
integer nx, s, e
double precision a(0:nx+1, s-1:e+1), b(0:nx+1, s-1:e+1)
!
double precision sum
integer i, j
!
sum = 0.0d0
do j=s,e
do i=1,nx
sum = sum + (a(i,j) - b(i,j)) ** 2
enddo
enddo
!
diff = sum
return
end

So now I would like to modify the subroutine that makes the exchange between neighbours. Each sweep is carried on by independent processes, and the exchange of "ghost points" is used to give the information in one process about the neighbor processes.

Now there is only one ghost point that is being exchanged between neighbors at each boundary of the subdomains. The corresponding subroutine is:

Code (Fortran):

subroutine exchng1(a, nx, s, e, comm1d, nbrbottom, nbrtop)
use mpi
integer nx, s, e, comm1d, nbrbottom, nbrtop
double precision a(0:nx+1,s-1:e+1)
integer ierr, req(4)
!
call MPI_IRECV(&
a(1,s-1), nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)
call MPI_IRECV(&
a(1,e+1), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(2), ierr)
call MPI_ISEND(&
a(1,e), nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(3), ierr)
call MPI_ISEND(&
a(1,s), nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, &
comm1d, req(4), ierr)
!
call MPI_WAITALL(4, req, MPI_STATUSES_IGNORE, ierr)
return
end
I would like to change this subroutine to exchange more than a single point, let's say four points, for example. However, when I try to do this, I obtain error messages. This is my modified subroutine:

Code (Fortran):

subroutine exchng1(a, nx, s, e, comm1d, nbrbottom, nbrtop)
use mpi
integer nx, s, e, comm1d, nbrbottom, nbrtop
double precision a(0:nx+1,s-3:e+3)
integer ierr, req(4)
!
call MPI_IRECV(&
a(1,s-3:s), 4*nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)
call MPI_IRECV(&
a(1,e:e+3), 4*nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(2), ierr)
call MPI_ISEND(&
a(1,e:e+3), 4*nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(3), ierr)
call MPI_ISEND(&
a(1,s-3:s), 4*nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, &
comm1d, req(4), ierr)
!
call MPI_WAITALL(4, req, MPI_STATUSES_IGNORE, ierr)
return
end

However, when the program calls this subroutine, the error appears:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

I know I'm doing something silly, probably I'm not handling correctly the size of the arrays. Maybe it has to do with something more subtle, like how I am tagging the processes, I'm not sure. I thought that perhaps somebody here could help me.

I don't know if it is clear what the idea is. I would like to extend each of the subdomains in such a way that they overlap by four points at each boundary, and then exchange those points between neighbors.

There are many things that I'm not understanding. When the send and receive is performed, I don't understand how each process knows in which part of the array it has to put the received information. This are just overwritten over the overlapping part of the arrays? I mean, the shared elements have the same indices, so when received, it overlaps those points in the part of the array corresponding to that process?

Last edited: Jan 11, 2019
2. Jan 11, 2019

### Klystron

The subroutines appear to extend indices before and past the array boundaries defined in main(); e.g., array "a" and index "s".

3. Jan 11, 2019

### Staff: Mentor

Here's a line from your code that works:
Code (Fortran):
call MPI_IRECV&
a(1,s-1), nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)
And one from the code the produces what appears to be a runtime error:
Code (Fortran):
call MPI_IRECV&
a(1,s-3:s), 4*nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)
I believe that the problem is the first parameter.
According to some documentation I found here (http://mpi.deino.net/mpi_functions/MPI_Irecv.html), the first parameter has to be a pointer; i.e., the address of some memory. In the first example, the expression a(1, s-1) represents a data value in your array, but Fortran passes array-type parameters by reference; i.e., by their addresses.

In the code that throws an error, the expression a(1, s-3:s) doesn't correspond to an address, thereby causing the error. What might work is to pass a(1, s-3) as the parameter, which corresponds to the first element of the four you want to work with. The second parameter informs the subroutine how many elements to work with.

4. Jan 14, 2019

### Telemachus

I have defined the extended arrays when I call the other subroutine, and the problem persists:

Code (Fortran):

program main
!
include "mpif.h"
integer maxn
parameter (maxn = 1024+6)
double precision  a(maxn,maxn), b(maxn,maxn), f(maxn,maxn)

Why it doesn't correspond to an adress if I have defined the array in the main containing all of those elements?

This is how the entire modified code looks like:

Code (Fortran):

!*******************************************************************
!   oned.f90 - a solution to the Poisson problem using Jacobi
!   interation on a 1-d decomposition
!
!   The size of the domain is read by processor 0 and broadcast to
!   all other processors.  The Jacobi iteration is run until the
!   change in successive elements is small or a maximum number of
!   iterations is reached.  The difference is printed out at each
!   step.
!*******************************************************************
program main
!
include "mpif.h"
integer maxn
parameter (maxn = 1024+6)
double precision  a(maxn,maxn), b(maxn,maxn), f(maxn,maxn)
integer nx, ny
integer myid, numprocs, ierr
integer comm1d, nbrbottom, nbrtop, s, e, it
double precision diff,diffb, diffnorm, dwork
double precision t1, t2
external diff
external diffb

call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
!
if (myid .eq. 0) then
!
!         Get the size of the problem
!
!          print *, 'Enter nx'
nx = 1024
endif
call MPI_BCAST(nx,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
ny   = nx
!
! Get a new communicator for a decomposition of the domain
!
call MPI_CART_CREATE(MPI_COMM_WORLD, 1, numprocs, .false., &
.true., comm1d, ierr )
!
! Get my position in this communicator, and my neighbors
!
call MPI_COMM_RANK( comm1d, myid, ierr )
call MPI_Cart_shift( comm1d, 0,  1, nbrbottom, nbrtop, ierr )
!
! Compute the actual decomposition
!
call MPE_DECOMP1D( ny, numprocs, myid, s, e )
!
! Initialize the right-hand-side (f) and the initial solution guess (a)

!      print*,'Hi. I am proc',myid
!

call onedinit( a, b, f, nx, s, e)

print*,'I am proc',myid,'s=',s,'e=',e

!
! Actually do the computation.  Note the use of a collective operation to
! check for convergence, and a do-loop to bound the number of iterations.
!
call MPI_BARRIER( MPI_COMM_WORLD, ierr )
t1 = MPI_WTIME()
do it=1, 100

call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop )
call sweep1d( a, f, nx, s, e, b )

call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop )
call sweep1d( b, f, nx, s, e, a )
dwork = diff( a, b, nx, s, e )

call MPI_Allreduce( dwork, diffnorm, 1, MPI_DOUBLE_PRECISION, &
MPI_SUM, comm1d, ierr )
if (diffnorm .lt. 1.0e-5) exit
!        if (myid .eq. 0) print *, 2*it, ' Difference is ', diffnorm
enddo
if (myid .eq. 0 .and. it .gt. 100) print *, 'Failed to converge'
t2 = MPI_WTIME()
if (myid .eq. 0) then
print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1, &
' secs '
print *,' Difference is ', diffnorm
endif
!
call MPI_FINALIZE(ierr)
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine exchng1(a, nx, s, e, comm1d, nbrbottom, nbrtop)
use mpi
integer nx, s, e, comm1d, nbrbottom, nbrtop
double precision a(0:nx+1,s-3:e+3)
integer ierr, req(4)
!
call MPI_IRECV(&
a(1,s-3:s), 4*nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)
call MPI_IRECV(&
a(1,e:e+3), 4*nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(2), ierr)
call MPI_ISEND(&
a(1,e:e+3), 4*nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(3), ierr)
call MPI_ISEND(&
a(1,s-3:s), 4*nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, &
comm1d, req(4), ierr)
!
call MPI_WAITALL(4, req, MPI_STATUSES_IGNORE, ierr)
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine onedinit( a, b, f, nx, s, e )
integer nx, s, e
double precision a(0:nx+1, s-3:e+3), b(0:nx+1, s-3:e+3), &
f(0:nx+1, s-3:e+3)
!
integer i, j
!
forall(j=s-3:e+3)
forall( i=0:nx+1)
a(i,j) = 0.0d0
b(i,j) = 0.0d0
f(i,j) = 0.0d0
end forall
end forall
!
!    Handle boundary conditions
!
forall(j=s-3:e+3)
a(0,j) = 1.0d0
b(0,j) = 1.0d0
a(nx+1,j) = 0.0d0
b(nx+1,j) = 0.0d0
end forall
if (s .eq. 1) then
forall( i=1:nx)
a(i,0) = 1.0d0
b(i,0) = 1.0d0
end forall
endif
!
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
!  This file contains a routine for producing a decomposition of a 1-d array
!  when given a number of processors.  It may be used in "direct" product
!  decomposition.  The values returned assume a "global" domain in [1:n]
!
subroutine MPE_DECOMP1D( n, numprocs, myid, s, e )
integer n, numprocs, myid, s, e
integer nlocal
integer deficit
!
nlocal  = n / numprocs
s         = myid * nlocal + 1
deficit = mod(n,numprocs)
s         = s + min(myid,deficit)
if (myid .lt. deficit) then
nlocal = nlocal + 1
endif
e = s + nlocal - 1
if (e .gt. n .or. myid .eq. numprocs-1) e = n
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
! Perform a Jacobi sweep for a 1-d decomposition.
! Sweep from a into b
!
subroutine sweep1d( a, f, nx, s, e, b )
integer nx, s, e
double precision a(0:nx+1,s-3:e+3), f(0:nx+1,s-3:e+3), &
b(0:nx+1,s-3:e+3)
!
integer i, j
double precision h
!
h = 1.0d0 / dble(nx+1)
do j=s-3, e+3
do i=1, nx
b(i,j) = 0.25 * (a(i-1,j)+a(i,j+1)+a(i,j-1)+a(i+1,j)) - &
h * h * f(i,j)
enddo
enddo
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
!  The rest of the 1-d program
!
double precision function diff( a, b, nx, s, e )
integer nx, s, e
double precision a(0:nx+1, s-3:e+3), b(0:nx+1, s-3:e+3)
!
double precision sum
integer i, j
!
sum = 0.0d0
do j=s,e
do i=1,nx
sum = sum + (a(i,j) - b(i,j)) ** 2
enddo
enddo
!
diff = sum
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I'll take a look to the documentation you've posted, and check if it works passing element by element.

Thanks!

PD: I have tried to change the subroutine to:
Code (Fortran):

subroutine exchng1(a, nx, s, e, comm1d, nbrbottom, nbrtop)
use mpi
integer nx, s, e, comm1d, nbrbottom, nbrtop
double precision a(0:nx+1,s-3:e+3)
integer ierr, req(4)
!
call MPI_IRECV(&
a(1,s-3), nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)
call MPI_IRECV(&
a(1,e+2), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(2), ierr)
call MPI_ISEND(&
a(1,e), nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(3), ierr)
call MPI_ISEND(&
a(1,s-1), nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, &
comm1d, req(4), ierr)
!
call MPI_WAITALL(4, req, MPI_STATUSES_IGNORE, ierr)
return
end
And stills giving the same error. I think there is a problem with the extension of the arrays in the subroutines, but I don't understand the reason why is causing it.

Last edited: Jan 14, 2019
5. Jan 14, 2019

### Klystron

I do not have your compiler but in my mind I initialized variable "s" then the subroutine accesses memory before the first instance of array A (s-3,...). I also noticed a possible violation using (routine) names but need to read the language extensions for the versions before commenting on that,

If this diagnosis is correct, the simple fix is to use the same constants that define the arrays (hint) to limit the array indices during data manipulation.

6. Jan 14, 2019

### Telemachus

You say that instead of calling "s" in the subroutine I should call h=s-3, for example? the original program uses "s" and "e" as the argument in the subroutine call, and then sends and receives s-1, and e+1. I have tried to use exactly the same idea, but it is not working.

I really don't understand what is happening, because I think the arrays are correctly defined, but still having this memory problem.

\$ mpiexec -n 2 oned2.x
I am proc 1 s= 513 e= 1024
I am proc 0 s= 1 e= 512

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F4A929AEE08
#2 0x7F4A925FE4AF
#3 0x7F4A84E55304
#4 0x7F4A9235CCDA
#5 0x7F4A92D05AC6
#6 0x401817 in exchng1_
#7 0x4021CF in MAIN__ at onedv0.0.2.f90:?
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 18241 on node odiseo exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

However, I have to admit that I'm not sure if the mpi call is being done properly. Because the send and receive specifies the array a(1,e:e+1), for example. At first I thought that the 1 in the first argument stood for the fact that the array that was being passed and received contained only one element for the first index. But I think that it should specify the piece of the array that is being sent and received, which shouldn't be just 1, and should be process dependent.

Code (Fortran):

call MPI_IRECV(&
a(1,e+1), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(2), ierr)
call MPI_ISEND(&
a(1,e), nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(3), ierr)

Last edited: Jan 14, 2019
7. Jan 14, 2019

### Telemachus

Ok. I modified the code a little bit. The idea is more or less the same. However, soemthing is happening. Now the error comes from the subroutine sweep1d. What is baffling is that if I comment the first call to sweep1d in the iteration loop, the program runs without any errors. However, if the it makes the first call to sweep1d, namely:
Code (Fortran):
call sweep1d( a, f, nx, s, e, b )

The program gives this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F9015FDBE08
#1 0x7F9015FDAF90
#2 0x7F9015C2B4AF
#0 0x7F3609488E08
#1 0x7F3609487F90
#2 0x7F36090D84AF
#3 0x401316 in sweep1d_
#3 0x401316 in sweep1d_
#4 0x40206E in MAIN__ at onedv0.0.5.f90:?
#4 0x40206E in MAIN__ at onedv0.0.5.f90:?
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 19594 on node odiseo exited on signal 11 (Segmentation fault).

The iteration loop is:

Code (Fortran):

do it=1, 100

call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop,myid )
!     print*,'Hi'
call sweep1d( a, f, nx, s, e, b )
!         print*,'Hey'

call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop,myid )
call sweep1d( b, f, nx, s, e, a )
dwork = diff( a, b, nx, s, e )

call MPI_Allreduce( dwork, diffnorm, 1, MPI_DOUBLE_PRECISION, &
MPI_SUM, comm1d, ierr )
if (diffnorm .lt. 1.0e-5) exit
!        if (myid .eq. 0) print *, 2*it, ' Difference is ', diffnorm
enddo

I say that it is baffling because the same call to the sweep1d subroutine is made, with the array b as the argument, after the call with the array a as the argument. All the same is donde for the array b than for the array a, there are two iterations per loop actually, and the array b is used to compute the difference between two succesive iterations. But why when the call is made with array b as the argument there are no errors, but if I leave the first call to sweep1D it doesn't work?

Here is the complete code now:

Code (Fortran):

!*******************************************************************
!   oned.f90 - a solution to the Poisson problem using Jacobi
!   interation on a 1-d decomposition
!
!   The size of the domain is read by processor 0 and broadcast to
!   all other processors.  The Jacobi iteration is run until the
!   change in successive elements is small or a maximum number of
!   iterations is reached.  The difference is printed out at each
!   step.
!*******************************************************************
program main
!
include "mpif.h"
integer maxn
parameter (maxn = 1030)
double precision  a(-2:1027,-2:1027), b(-2:1027,-2:1027), &
f(-2:1027,-2:1027)

!      double precision a(-2:1027,-2:1027),b(-2:1027,-2:1027),&
!                      f(-2:1027,-2:1027)
integer nx, ny
integer myid, numprocs, ierr
integer comm1d, nbrbottom, nbrtop, s, e, it
double precision diff,diffb, diffnorm, dwork
double precision t1, t2
external diff

call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
!
if (myid .eq. 0) then
!
!         Get the size of the problem
!
!          print *, 'Enter nx'
nx = 1024
endif
call MPI_BCAST(nx,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
ny   = nx
!
! Get a new communicator for a decomposition of the domain
!
call MPI_CART_CREATE(MPI_COMM_WORLD, 1, numprocs, .false., &
.true., comm1d, ierr )
!
! Get my position in this communicator, and my neighbors
!
call MPI_COMM_RANK( comm1d, myid, ierr )
call MPI_Cart_shift( comm1d, 0,  1, nbrbottom, nbrtop, ierr )
!
! Compute the actual decomposition
!
call MPE_DECOMP1D( ny, numprocs, myid, s, e )
!
! Initialize the right-hand-side (f) and the initial solution guess (a)

!      print*,'Hi. I am proc',myid
!

call onedinit( a, b, f, nx, s, e)

print*,'I am proc',myid,'s=',s,'e=',e

!
! Actually do the computation.  Note the use of a collective operation to
! check for convergence, and a do-loop to bound the number of iterations.
!
call MPI_BARRIER( MPI_COMM_WORLD, ierr )
t1 = MPI_WTIME()
do it=1, 100

call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop,myid )
!     print*,'Hi'
!   call sweep1d( a, f, nx, s, e, b )
!         print*,'Hey'

call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop,myid )
call sweep1d( b, f, nx, s, e, a )
dwork = diff( a, b, nx, s, e )

call MPI_Allreduce( dwork, diffnorm, 1, MPI_DOUBLE_PRECISION, &
MPI_SUM, comm1d, ierr )
if (diffnorm .lt. 1.0e-5) exit
!        if (myid .eq. 0) print *, 2*it, ' Difference is ', diffnorm
enddo
if (myid .eq. 0 .and. it .gt. 100) print *, 'Failed to converge'
t2 = MPI_WTIME()
if (myid .eq. 0) then
print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1, &
' secs '
print *,' Difference is ', diffnorm
endif
!
call MPI_FINALIZE(ierr)
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine exchng1(a, nx, s, e, comm1d, nbrbottom, nbrtop, myid)
use mpi
integer nx, s, e, comm1d, nbrbottom, nbrtop
double precision a(-2:1027,s-3:e+3)
integer ierr, req(8),myid
!

call MPI_IRECV(&
a(1,s-3), nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(1), ierr)

call MPI_IRECV(&
a(1,e+3), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(2), ierr)

call MPI_IRECV(&
a(1,s-2), nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, &
comm1d, req(3), ierr)

call MPI_IRECV(&
a(1,e+2), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, &
comm1d, req(4), ierr)

call MPI_ISEND(&
a(1,e+1), nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(5), ierr)

call MPI_ISEND(&
a(1,s), nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, &
comm1d, req(6), ierr)

call MPI_ISEND(&
a(1,e), nx, MPI_DOUBLE_PRECISION, nbrtop, 0, &
comm1d, req(7), ierr)
call MPI_ISEND(&
a(1,s-1), nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, &
comm1d, req(8), ierr)
!

call MPI_WAITALL(8, req, MPI_STATUSES_IGNORE, ierr)
print*,'ierr=',ierr,'my id=',myid

return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine onedinit( a, b, f, nx, s, e )
integer nx, s, e
double precision a(-2:1027,s-3:e+3), b(-2:1027,s-3:e+3), &
f(-2:1027,s-3:e+3)
!
integer i, j
!
forall(j=s-3:e+3)
forall( i=0:nx+1)
a(i,j) = 0.0d0
b(i,j) = 0.0d0
f(i,j) = 0.0d0
end forall
end forall
!
!    Handle boundary conditions
!
forall(j=s-3:e+3)
a(0,j) = 1.0d0
b(0,j) = 1.0d0
a(nx+1,j) = 0.0d0
b(nx+1,j) = 0.0d0
end forall
if (s .eq. 1) then
forall( i=1:nx)
a(i,0) = 1.0d0
b(i,0) = 1.0d0
end forall
endif
!
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
!  This file contains a routine for producing a decomposition of a 1-d array
!  when given a number of processors.  It may be used in "direct" product
!  decomposition.  The values returned assume a "global" domain in [1:n]
!
subroutine MPE_DECOMP1D( n, numprocs, myid, s, e )
integer n, numprocs, myid, s, e
integer nlocal
integer deficit
!
nlocal  = n / numprocs
s         = myid * nlocal + 1
deficit = mod(n,numprocs)
s         = s + min(myid,deficit)
if (myid .lt. deficit) then
nlocal = nlocal + 1
endif
e = s + nlocal - 1
if (e .gt. n .or. myid .eq. numprocs-1) e = n
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
! Perform a Jacobi sweep for a 1-d decomposition.
! Sweep from a into b
!
subroutine sweep1d( a, f, nx, s, e, b )
integer nx, s, e
double precision a(-2:1027,s-3:e+3), b(-2:1027,s-3:e+3), &
f(-2:1027,s-3:e+3)
!
integer i, j
double precision h
!
h = 1.0d0 / dble(nx+1)
forall(j=s-3:e+3)
forall( i=1:nx)
b(i,j) = 0.25 * (a(i-1,j)+a(i,j+1)+a(i,j-1)+a(i+1,j)) - &
h * h * f(i,j)
end forall
end forall
return
end

!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

!
!  The rest of the 1-d program
!
double precision function diff( a, b, nx, s, e )
integer nx, s, e
double precision a(-2:1027,s-3:e+3), b(-2:1027,s-3:e+3), &
f(-2:1027,s-3:e+3)
!
double precision sum
integer i, j
!
sum = 0.0d0
do j=s,e
do i=1,nx
sum = sum + (a(i,j) - b(i,j)) ** 2
enddo
enddo
!
diff = sum
return
end

[/quode]

8. Jan 14, 2019

### Staff: Mentor

I don't see how we can diagnose without the documentation on the MPI routines.

I also note that you never check or print out the ierr code after each call. That could be giving clues.

9. Jan 14, 2019

### Telemachus

I actually have checked (the last code prints those values), and gives ierr=0 in all calls, which is ok. However, the problem is now in the first call to the subroutine sweep1d, which doesn't have any mpi call.

10. Jan 15, 2019

### Telemachus

I have found the bug. It is a very silly thing, the sweep was calling array elements outside of bounds.

11. Jan 20, 2019 at 3:54 PM

### Staff: Mentor

I've been away for about a week, so haven't been able to reply. In post #7, you have this code for the sweep1d subroutine.
Code (Fortran):
! Perform a Jacobi sweep for a 1-d decomposition.
! Sweep from a into b
!
subroutine sweep1d( a, f, nx, s, e, b )
integer nx, s, e
double precision a(-2:1027,s-3:e+3), b(-2:1027,s-3:e+3), &
f(-2:1027,s-3:e+3)
! <snip>
I'm reasonably certain that your declaration for the arrays a, b, and f are not allowed as you have done them. That is, you are declaring the sizes of the three arrays using values known only at run time. The dimensions of the arrays need to be known at compile time, while the values of s and e are known only at run time, which is always after compilation has been done. The compiler allocates space for arrays, so the arrays need to be declared with literals (constants such as 10 or 100) or with variables that have been declared in PARAMETER declaration statements, such as you have done with maxn.

As a side note, you really should put more thought into your variable names. For example, just for the sweep1d subroutine, your parameters are a, f, nx, s, e, and b, none of which convey any idea of what they will be used for. At the very least you should have brief descriptions of all parameters in any subroutine you write. This is something I insist on in any programming class I teach. Although you have some understanding of what the parameters are being used for, without some documentation of the subroutine interface, it is very difficult for others to understand what you're trying to do.

12. Jan 20, 2019 at 3:59 PM

### Telemachus

Ok. Only one thing just to be clear, it is not my code, I've found it on the internet.

13. Jan 20, 2019 at 4:07 PM

### Staff: Mentor

Which is no guarantee that it will even compile, let alone run. I don't have a Fortran compiler any more, so I can't check your code. I think I'm correct, though, in saying that you can't declare an array with a size that is a variable, as was the case with this declaration:
Code (Fortran):
double precision a(-2:1027,s-3:e+3)
as well as the other two arrays in that declaration.
This would work, however, if s and e were declared in PARAMETER statements.