110 likes | 286 Views
Parallel Strategies. Partitioning consists of the following steps Divide the problem into parts Compute each part separately Merge the results Divide and Conquer Dividing problem recursively into sub-problems of the same type
E N D
Parallel Strategies • Partitioning consists of the following steps • Divide the problem into parts • Compute each part separately • Merge the results • Divide and Conquer • Dividing problem recursively into sub-problems of the same type • Assign sub-problems to individual processors (e.g. Save and hold) • Domain (Data) Decomposition • Assign parts of the data to separate processors • Functional Decomposition • Assign application functions to separate processors
+ + + + + + + + + + + + + + + + + + + + + + + Partitioning and Divide and Conquer • Partition • Divide, compute and merge Example Applications • Summing Numbers • Sorting Algorithms • Numerical Integration • The N-body problem • Bucket Sort • Adaptive Quadrature Partitioning • Divide and Conquer • Divide and Hold Half Divide and Conquer
Bucket Sort Partitioning Unsorted Numbers Unsorted Numbers • Communication reduction is possible if each processor sends a small bucket to each other processor • Bucket Sort works well if numbers are uniformly distributed across a known interval (e.g. 0->1) P0 P0 P0 P0 Sorted Sequential Bucket Sort Sorted Parallel Bucket Sort
All-To-All Broadcast Processor i Buffer • Bucket Sort is a possible application for the all-to-all broadcast • All-to-all is also useful for transposing matrices P0 P1 P2 Pi P-1
Divide and Conquer Sum of N Numbers • Two Conditions Required for Recursive Solutions • How does the recursion terminate? • How does a problem of size n relate to a problem of size < n? • Pseudo codeIf (less than two numbers) return sumDivide the problem into two partsRecursive call to sum the first partRecursive call to sum the second partMerge the two partial sums and return the total • Parallel implementation with eight processors P0 keep half and send half to P4 P4,P0 keep half and send half to P2,P6 respectively P0,P2,P4,P6 keep half and send half to P1,P3,P5,P7 respectively Perform the computation in parallel The merge phase Non leafs receive and reduce resultsNon root sends results to the parent processor
a b a b p δ q p δ q Rectangles Trapezoids Numerical Integration • Difficulties • How do we choose the value for δ? • Parts of the integral requires a smaller δthan others
Adaptive Quadrature C A B • Pseudo codep=a, δ = b-aWHILE p < b DOq = (a+δ>b)?q=b:q=a+δ x = (p+q)/2Compute A, B, and C IF C>toleranceδ /= 2 WHILE C > tolerancep += δ; δ*=2 • Notes and Questions • When do we terminate? • Termination rates differ • Can we balance processor load? a p x q b C = 0 A B p a x q b
Parallel Numerical Integration • Sequential Algorithm Choose a δ For each region, xi, in the integral Calculate sum += f(xi) * δ • Parallel algorithm • Static Assignment (Question: How to choose δ?) Send region to each processor Processors perform parallel computation Reduce add operation computes final result • Dynamic Assignment • Adaptive Quadrature varies the convergence rates • Use work pool approach for assigning regions
Predict positions and movements of bodies in space For astrophysics and molecular dynamics Based on the Newtonian laws of physics FormulaeF = G mx my / rxy2F = m a NotationG = Gravitational constantmx,my = mass of bodies x, yrxy = distance between x, ya = accellerationF = force between bodies 3 Dimension Force F = G mxmy rx / rxy3 rx =distance in the x direction Gravitational N-Body Problem
The N-body problemastronomical systems, electrical charges, etc. • Sequential Solution Pseudo Code For each time step, t. Compute pair-wise forces (Fx=Gmamb(xa-xb)/r3) Compute acceleration on each body (F=ma) Compute velocity for each body (vt+1=vt+ a Dt) Compute new position of each body (xt+1=xt+ vt+1Dt) • Parallel Solution Notes • Partition the bodies among the processors • Communication costs are relatively high • This n2algorithm doesn’t scale well to lots of bodies
r Center of mass Barnes and Hut Solution N lg N Complexity instead of N2 • Pseudo codeFOR each time step, t Perform recursive division All-to-all the essential tree Perform Parallel calculations Output visualization data • Questions • How is the best way to partition the n-bodies? • Should the partitioning be dynamic or static? Treat distant clusters as a single body 2-Dimensional Recursive Division