Computer Science 320

Computer Science 320 Broadcasting

Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)

Floyd’s Algorithm on Cluster • Root node reads distance matrix from input file and scatters row slices to other nodes • Other nodes compute distances and update their slices • The slices are gathered back to the root node for output

Parallel I/O File Pattern • Eliminate the gather of data by having each node write its slice to a separate file • Eliminate the scatter of data by having each node read its slice from the input file

Execution Timeline

Sharing Data in Computation • On each pass through the outer loop, the ith row must be available to all of the processes (they all execute the same line of code in the inner loop) • They can do this in SMP because they share the entire matrix • They can’t do this in a cluster setup, because they don’t share for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)

Share Row via a Broadcast Message • The process that owns a row broadcasts it before the parallel loop is run, on each pass through the outer loop • Process that owns the row acts as the root for the broadcast, setting up the source buffer • The other processes set up a destination buffer • Broadcast also enforces synchronization; they all wait for the broadcast for i = 0 to n – 1 broadcast row i of d parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)

// Allocate storage for row broadcast from another process. row_i= new double [n]; row_i_buf= DoubleBuf.buffer (row_i); inti_root = 0; for (inti = 0; i < n; ++ i){ double[] d_i = d[i]; // Determine which process owns row i. if (! ranges[i_root].contains(i)) ++ i_root; // Broadcast row i from owner process to all processes. if (rank == i_root) world.broadcast(i_root, DoubleBuf.buffer (d_i)); else{ world.broadcast(i_root, row_i_buf); d_i= row_i; } // Inner loops over rows in my slice and over all columns. for (int r = mylb; r <= myub; ++ r){ double[] d_r = d[r]; for (int c = 0; c < n; ++ c) d_r[c] = Math.min (d_r[c], d_r[i] + d_i[c]); } }

Problem: Too Many Messages • The amount of time spent in communication is too high when compared to the time spent in computation

Computer Science 320