Fortran Improving Performance with MPI in Fortran90

SFA10 · Jun 12, 2009

I'm just starting to use MPI in Fortran90 and have some questions on improving performance.

At the start of a calculation, I run a stage to distribute all required data to the slave nodes. This is a one-off step, but seems to be taking a long time.

The code running on the master node looks something like:

DO i_node = 1, nnodes
CALL MPI_SEND(...)
...
CALL MPI_SEND(...)
END DO

The number of data items, and hence calls to MPI_SEND, is of order 50.

Would it better to use MPI_BCAST instead of including the MPI_SENDs in a loop over i_node? If so, do the slaves need to use MPI_RECV or MPI_BCAST to receive the data?

Would using un-blocked sends be better? I am assuming I'd then need to include several MPI_WAIT statements after the loop. If I were to use un-blocked sends, would each call in the loop need a different process id number, to be later referred to in a unique call to MPI_WAIT?

Are there any benefits of using MPI_PACK to combine all the data items, then using a single send to transmit that (with it being unpacked when reaching the slave nodes)?

Apologies if the above seems like a lot of questions! Any help very much appreciated!

pig · Jun 15, 2009

Are you sure this is what is taking a long time? I use MPI in C, not Fortran, but what is slow in a MPI program is starting all the processes. This can take up to a minute sometimes when there's a lot of them.

Try adding a Barrier before the Send loop, and after the barrier print something on the screen.

Also, if the number of tasks you are distributing isn't equal to the number of slave processes, then this isn't a very good way to do it, it's better to send each process one task, then use Iprobe in a loop to check when a process has finished, Recv the result and send it another task, and do this until you get all the results (I'm assuming the tasks are independent of each other).

I just hope those 50 sends aren't different pieces of the same task that you are sending 1 by 1 instead of putting them in a single package...

minger · Jun 16, 2009

I believe our in-house CFD code uses a BCAST at the start of the run. I'm not sure how exactly your sending your messages, but make sure they're in 1D packed arrays. MUCH faster

Fortran Improving Performance with MPI in Fortran90

Thread 'For those who ask: "What programming language should I learn?"'

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

Who is responsible for the software when AI takes over programming?

How to calculate Tension for a series of connected points?

Learning Assembly and computer architecture for x86

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers