I'm just starting to use MPI in Fortran90 and have some questions on improving performance. At the start of a calculation, I run a stage to distribute all required data to the slave nodes. This is a one-off step, but seems to be taking a long time. The code running on the master node looks something like: DO i_node = 1, nnodes CALL MPI_SEND(.....) ..... CALL MPI_SEND(.....) END DO The number of data items, and hence calls to MPI_SEND, is of order 50. Would it better to use MPI_BCAST instead of including the MPI_SENDs in a loop over i_node? If so, do the slaves need to use MPI_RECV or MPI_BCAST to receive the data? Would using un-blocked sends be better? I am assuming I'd then need to include several MPI_WAIT statements after the loop. If I were to use un-blocked sends, would each call in the loop need a different process id number, to be later referred to in a unique call to MPI_WAIT? Are there any benefits of using MPI_PACK to combine all the data items, then using a single send to transmit that (with it being unpacked when reaching the slave nodes)? Apologies if the above seems like a lot of questions! Any help very much appreciated!!