160 likes | 177 Views
Explore the usage of MPI for parallelizing Monte Carlo integration algorithms to improve accuracy and speed. Monitor performance, parallelization techniques, test functions, and code for maximizing accuracy. Predictions and performance monitoring results are discussed.
E N D
Monte Carlo IntegrationUsing MPI Barry L. Kurtz Chris Mitchell Appalachian State University
Monitoring MPI Performance • Goals • We will use MPI • We will parallelize the algorithm to increase accuracy • We will parallelize the algorithm to increase speed • We will vary the number of processors from 1 to 8 under these conditions • Node performance monitoring • Graphical plot of CPU usage on each node • Separates out types of CPU tasks
Integration Using Monte Carlo • Main idea • Similar to the PI program demonstrated with MATLAB • place random points in a rectangular area and find the percentage of points that satisfy the given criteria • Our functions will be in the first quadrant only • Variables • Number of processors used • The function being integrated • The number of histories in the sample space • The low and high range for the interval
Example: f(x) = 2 x2 • Given the range 0 to 5 • The analytic solution is 2/3 x3 evaluated from 0 to 5 giving 83 1/3 • Sample Calculation: • # Hits = 3 • Total pts = 10 • Area of rectangle = 250 • Estimate of Integral250*3/10 = 75
Parallelization Techniques • Increase the number of points by giving each processor the specified number of points • As number of processors increases we expect accuracy to increase due to the larger number of total points • Computation time should not change dramatically • Divide a specified number of points “equally” between the processors • As number of processors increases we expect accuracy to stay the same • Total computation time should decrease
Three Test Functions • f(x) = 2x2 – Strictly increasing function • g(x) = e-x – Strictly decreasing function • h(x) = 2 + sin(x) – Oscillating function • How will we find the area of the enclosing rectangle? • Issues arise with finding maximum value of the given function on the given interval • Think of a solution that could apply to all three functions given above
MPI Code for Finding Max double findMax(double low, double high, double(*fp)(double)) { double i, interval, /* size of steps between tests */ result, /* function return*/ max = 0; /* holds max value thus far found */ interval = (high - low)/100; for(i = low; i < high; i += interval) { result = fp(i); if(result > max) max = result; } return max; }
MPI Initialization for Accuracy MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ::: MPI_Bcast(&numHist, 1, MPI_INT, MASTER, MPI_COMM_WORLD); MPI_Bcast(&low, 1, MPI_DOUBLE, MASTER, MPI_COMM_WORLD); MPI_Bcast(&high, 1, MPI_DOUBLE,MASTER, MPI_COMM_WORLD);
MPI Code for Accuracy /* history calculation loop */ for(i = 0; i < numHist; i++ ) { x = ((double)random()/((double)(RAND_MAX) + (double)(1))); x *= (high - low); x += low; y = ((double)random()/((double)(RAND_MAX) + (double)(1))) * max; /* if point is below the function value, it's a hit */ if(y < fp(x)) /* fp is the function to be integrated */ { hits++; } total++; } /* calculate this process' estimate of function's area */ subArea = ((double)(hits)/(double)(total)) * (max * (high - low));
Gather the Data and Calculate the Result /* calculate total hits and histories generated by all processes */ MPI_Reduce(&hits, &allHits, 1, MPI_INT, MPI_SUM, MASTER, MPI_COMM_WORLD); MPI_Reduce(&total, &allTotal, 1, MPI_INT, MPI_SUM, MASTER, MPI_COMM_WORLD); if(rank == MASTER) { area = ((double)(allHits)/(double)(allTotal)) * (max * (high - low)); printf("\nArea of function between %5.3f and %5.3f is: %f\n", low, high, area);
MPI Initialization for Speed MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ::: numHist = numHist/size; ::: MPI_Bcast(&numHist, 1, MPI_INT, MASTER, MPI_COMM_WORLD); MPI_Bcast(&low, 1, MPI_DOUBLE, MASTER, MPI_COMM_WORLD); MPI_Bcast(&high, 1, MPI_DOUBLE, MASTER, MPI_COMM_WORLD);
What Are Your Predictions? • Will accuracy increase linearly with the number of processors? • Will the execution time decrease linearly with the number of processors? • How important is the random number generation? • Would you expect occasional anomalies?
The Performance Monitor • Monitors Performance on a Local Cluster • Separates the following types of CPU usage • User % • System % • Easy % • Total % • Provides a quick, intuitive view of the load balancing for the algorithm distribution • Developed at Appalachian State by Keith Woodie and Michael Economy
Results for Increasing Accuracy • Number of Histories per processor = 10,000,000
Results for Increasing Speed • Total Number of Histories = 10,000,000