140 likes | 237 Views
Computer Science 320. Reduction. Estimating π. Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then, C / N should be about the same ratio as circle area / square area
E N D
Computer Science 320 Reduction
Estimating π Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then, C / N should be about the same ratio as circle area / square area Circle’s area = π * R2, and circle quadrant’s area is π / 4, where R = 1 Then C / N = π / 4, and π = 4 * C / N
Monte Carlo Methods Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle π = 4 * C / N Monte Carlo methods make use of random numbers to solve a problem The more points we generate, the more accurate the estimate
Sequential Program PiSeq • Inputs: • the seed for the random number generator • the number of points to generate • Output: the estimate of π • Resources: java.util.Random
Sequential Program PiSeq // Start timing. long time = -System.currentTimeMillis(); // Command line arguments. static long seed; static long N; // Pseudorandom number generator. static Random prng; // Number of points within the unit circle. static long count; // Validate command line arguments. if (args.length != 2) usage(); seed = Long.parseLong (args[0]); N = Long.parseLong (args[1]); // Set up PRNG. prng= new Random (seed);
Sequential Program PiSeq // Generate n random points in the unit square, count how many are in // the unit circle. count = 0; for (long i = 0; i < N; ++ i){ double x = prng.nextDouble(); double y = prng.nextDouble(); if (x * x + y * y <= 1.0) ++ count; } // Stop timing. time += System.currentTimeMillis(); // Print results. System.out.println("pi = 4 * " + count + " / " + N + " = " + (4.0 * count / N));
Parallelize! • Multiple threads generate and throw darts • Shared variables: prng and count • These are WMRM, so the threads must be synchronized • java.util.Random is multiple thread-safe, because it uses an atomic compare-and-set (CAS) operation • edu.rit.pj.reduction.SharedLong also employs CAS and thus is multiple thread-safe
Parallel Program PiSmp // Generate n random points in the unit square, count how many are in // the unit circle. count = new SharedLong(0); new ParallelTeam().execute(new ParallelRegion(){ public void run() throws Exception{ execute(0, N-1, new LongForLoop(){ public void run (long first, long last){ for (long i = first; i <= last; ++ i){ double x = prng.nextDouble(); double y = prng.nextDouble(); if (x*x + y*y <= 1.0) count.incrementAndGet(); } } }); } });
Problem Synchronization on prng and count means threads must wait on each iteration; the more threads there are, the more waiting occurs There might be billions of iterations!
Solution Each thread gets its own prng and count, and the counts are combined at the end: reduction!
new ParallelTeam().execute (new ParallelRegion(){ public void run() throws Exception{ execute (0, N-1, new LongForLoop(){ // Set up per-thread PRNG and counter. Random prng_thread = new Random (seed); long count_thread = 0; // Extra padding to avert cache interference. long pad0, pad1, pad2, pad3, pad4, pad5, pad6, pad7; long pad8, pad9, pada, padb, padc, padd, pade, padf; // Parallel loop body. public void run (long first, long last){ // Generate random points. for (long i = first; i <= last; ++ i){ double x = prng_thread.nextDouble(); double y = prng_thread.nextDouble(); if (x*x + y*y <= 1.0) ++ count_thread; } } public void finish(){ // Reduce per-thread counts into shared count. count.addAndGet(count_thread); } }); } }); Parallel Program PiSmp2
Another Problem • Parallel version with one thread produces the same value of π as the sequential version • With multiple threads, we get a different value of π with the same seed and N • Each thread is generating the same points!