140 likes | 206 Views
Computer Science 320. Reduction. Estimating π. Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then, C / N should be about the same ratio as circle area / square area
E N D
Computer Science 320 Reduction
Estimating π Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then, C / N should be about the same ratio as circle area / square area Circle’s area = π * R2, and circle quadrant’s area is π / 4, where R = 1 Then C / N = π / 4, and π = 4 * C / N
Sequential Program PiSeq // Generate n random points in the unit square, count how many are in // the unit circle. count = 0; for (long i = 0; i < N; ++ i){ double x = prng.nextDouble(); double y = prng.nextDouble(); if (x * x + y * y <= 1.0) ++ count; } // Stop timing. time += System.currentTimeMillis(); // Print results. System.out.println("pi = 4 * " + count + " / " + N + " = " + (4.0 * count / N));
new ParallelTeam().execute (new ParallelRegion(){ public void run() throws Exception{ execute (0, N-1, new LongForLoop(){ // Set up per-thread PRNG and counter. Random prng_thread = Random.getInstance(seed); long count_thread = 0; // Extra padding to avert cache interference. long pad0, pad1, pad2, pad3, pad4, pad5, pad6, pad7; long pad8, pad9, pada, padb, padc, padd, pade, padf; // Parallel loop body. public void run (long first, long last){ // Skip PRNG ahead to index <first> prng_thread.setSeed(seed); prng_thread.skip(2 * first); // Generate random points. for (long i = first; i <= last; ++ i){ double x = prng_thread.nextDouble(); double y = prng_thread.nextDouble(); if (x * x + y * y <= 1.0) ++ count_thread; } } Parallel Program PiSmp3
Reduction Step, SMP-Style static SharedLong count; . . . . . . public void finish(){ // Reduce per-thread counts into shared count. count.addAndGet(count_thread); }
Monte Carlo Design for a Cluster • Could keep global counter in process 0, but that would involve too many messages • Use reduction instead, so message passing is minimal • Each process has its own PRNG, with its own split sequence
Reduction vs Gather • Could allocate an array of K cells for results, where the ith processor’s result is in the ith cell; then gather these into process 0 and let process 0 reduce the end result from these • Instead, the reduce method employs all processes in computing the reduction
Reduction in Cluster • Concentrate data into fewer and fewer processes • When K = 8, • processes 4-7 send their data to processes 0-3 • processes 2-3 send their results to processes 0-1 • process 1 sends its results to process 0 • At most log2(K) messages!
Reduction Tree for K = 8 Messages are sent in parallel at each level, starting at the bottom When results have been computed, messages are sent from the next level
Example: Add the Results Initial state After first set of messages
Example: Add the Results After second set of messages After third set of messages
It’s Automatic: reduce world.reduce(0, buf, InegerOp.SUM); // Compute the count in each processor ... // Perform the reduction step LongItemBufbuf = new LongItemBuf(); buf.item = count; world.reduce(0, buf, InegerOp.SUM); count = buf.item; ... ... if (rank == 0) // Output the count and the estimate of PI
Reduction in Mandelbrot Histogram int[] histogram = new int[maxiter = 1]; ... world.reduce(0, IntegerBuf.buffer(histogram), InegerOp.SUM);
Reduction in Mandelbrot Histogram int[] histogram = new int[maxiter = 1]; ... world.reduce(0, IntegerBuf.buffer(histogram), InegerOp.SUM);