MPI_ERR_TRUNCATE: On Broadcast

Jiew Meng picture Jiew Meng · Nov 8, 2012 · Viewed 10.6k times · Source

I have an int I intend to broadcast from root (rank==(FIELD=0)).

int winner

if (rank == FIELD) {
    winner = something;
}

MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&winner, 1, MPI_INT, FIELD, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if (rank != FIELD) {
    cout << rank << " informed that winner is " << winner << endl;
}

But it appears I get

[JM:6892] *** An error occurred in MPI_Bcast
[JM:6892] *** on communicator MPI_COMM_WORLD
[JM:6892] *** MPI_ERR_TRUNCATE: message truncated
[JM:6892] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

Found that I can increase the buffer size in Bcast

MPI_Bcast(&winner, NUMPROCS, MPI_INT, FIELD, MPI_COMM_WORLD);

Where NUMPROCS is number of running processes. (actually seems like I just need it to be 2). Then it runs, but gives unexpected output ...

1 informed that winner is 103
2 informed that winner is 103
3 informed that winner is 103
5 informed that winner is 103
4 informed that winner is 103

When I cout the winner, it should be -1

Answer

Hristo &#39;away&#39; Iliev picture Hristo 'away' Iliev · Nov 8, 2012

There is an error early in your code:

if (rank == FIELD) {
   // randomly place ball, then broadcast to players
   ballPos[0] = rand() % 128;
   ballPos[1] = rand() % 64;
   MPI_Bcast(ballPos, 2, MPI_INT, FIELD, MPI_COMM_WORLD);
}

This is a very common mistake. MPI_Bcast is a collective operation and it must be called by all processes in order to complete. What happens in your case is that this broadcast is not called by all processes in MPI_COMM_WORLD (but only by the root) and hence interferes with the next broadcast operation, namely the one inside the loop. The second broadcast operation actually receives messages sent by the first one (two int elements) into a buffer for just one int and hence the truncation error message. In Open MPI each broadcast uses internally the same message tag values and hence different broadcasts can interfere with each other in not issued in sequence. This is compliant with the (old) MPI standard - one cannot have more than one outstanding collective operations in MPI-2.2 (in MPI-3.0 one can have several outstanding non-blocking collective operations). You should rewrite the code as:

if (rank == FIELD) {
   // randomly place ball, then broadcast to players
   ballPos[0] = rand() % 128;
   ballPos[1] = rand() % 64;
}
MPI_Bcast(ballPos, 2, MPI_INT, FIELD, MPI_COMM_WORLD);