Fortran Improving Performance with MPI in Fortran90

Click For Summary
Improving performance with MPI in Fortran90 can be achieved by considering alternatives to multiple MPI_SEND calls, such as using MPI_BCAST for data distribution, which simplifies the process and may enhance speed. Unblocked sends could also be beneficial, but they require careful management with MPI_WAIT statements to ensure proper synchronization. Combining data items using MPI_PACK for a single send can significantly reduce overhead compared to sending each item individually. Additionally, implementing a barrier before the send loop can help identify delays in the process. Overall, optimizing the data distribution method is crucial for enhancing MPI performance in Fortran90 applications.
SFA10
Messages
2
Reaction score
0
I'm just starting to use MPI in Fortran90 and have some questions on improving performance.

At the start of a calculation, I run a stage to distribute all required data to the slave nodes. This is a one-off step, but seems to be taking a long time.

The code running on the master node looks something like:

DO i_node = 1, nnodes
CALL MPI_SEND(...)
...
CALL MPI_SEND(...)
END DO

The number of data items, and hence calls to MPI_SEND, is of order 50.

Would it better to use MPI_BCAST instead of including the MPI_SENDs in a loop over i_node? If so, do the slaves need to use MPI_RECV or MPI_BCAST to receive the data?

Would using un-blocked sends be better? I am assuming I'd then need to include several MPI_WAIT statements after the loop. If I were to use un-blocked sends, would each call in the loop need a different process id number, to be later referred to in a unique call to MPI_WAIT?

Are there any benefits of using MPI_PACK to combine all the data items, then using a single send to transmit that (with it being unpacked when reaching the slave nodes)?

Apologies if the above seems like a lot of questions! Any help very much appreciated!
 
Technology news on Phys.org
Are you sure this is what is taking a long time? I use MPI in C, not Fortran, but what is slow in a MPI program is starting all the processes. This can take up to a minute sometimes when there's a lot of them.

Try adding a Barrier before the Send loop, and after the barrier print something on the screen.

Also, if the number of tasks you are distributing isn't equal to the number of slave processes, then this isn't a very good way to do it, it's better to send each process one task, then use Iprobe in a loop to check when a process has finished, Recv the result and send it another task, and do this until you get all the results (I'm assuming the tasks are independent of each other).

I just hope those 50 sends aren't different pieces of the same task that you are sending 1 by 1 instead of putting them in a single package...
 
Last edited:
I believe our in-house CFD code uses a BCAST at the start of the run. I'm not sure how exactly your sending your messages, but make sure they're in 1D packed arrays. MUCH faster
 
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 15 ·
Replies
15
Views
3K
Replies
6
Views
4K
  • · Replies 2 ·
Replies
2
Views
9K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
1
Views
2K