Using MPI_Bcast for MPI communication

David picture David · Oct 23, 2011 · Viewed 64.8k times · Source

I'm trying to broadcast a message from the root node to all other nodes using MPI_Bcast. However, whenever I run this program it always hangs at the beginning. Does anybody know what's wrong with it?

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
        int rank;
        int buf;
        MPI_Status status;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);

        if(rank == 0) {
                buf = 777;
                MPI_Bcast(&buf, 1, MPI_INT, 0, MPI_COMM_WORLD);
        }
        else {
                MPI_Recv(&buf, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
                printf("rank %d receiving received %d\n", rank, buf);
        }

        MPI_Finalize();
        return 0;
}

Answer

Jonathan Dursi picture Jonathan Dursi · Oct 23, 2011

This is a common source of confusion for people new to MPI. You don't use MPI_Recv() to receive data sent by a broadcast; you use MPI_Bcast().

Eg, what you want is this:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
        int rank;
        int buf;
        const int root=0;

        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);

        if(rank == root) {
           buf = 777;
        }

        printf("[%d]: Before Bcast, buf is %d\n", rank, buf);

        /* everyone calls bcast, data is taken from root and ends up in everyone's buf */
        MPI_Bcast(&buf, 1, MPI_INT, root, MPI_COMM_WORLD);

        printf("[%d]: After Bcast, buf is %d\n", rank, buf);

        MPI_Finalize();
        return 0;
}

For MPI collective communications, everyone has to particpate; everyone has to call the Bcast, or the Allreduce, or what have you. (That's why the Bcast routine has a parameter that specifies the "root", or who is doing the sending; if only the sender called bcast, you wouldn't need this.) Everyone calls the broadcast, including the receivers; the receviers don't just post a receive.

The reason for this is that the collective operations can involve everyone in the communication, so that you state what you want to happen (everyone gets one processes' data) rather than how it happens (eg, root processor loops over all other ranks and does a send), so that there is scope for optimizing the communication patterns (eg, a tree-based hierarchical communication that takes log(P) steps rather than P steps for P processes).