400 likes | 469 Views
Adding numbers. n data items, p processors. t s = O(n). t p = O(n/p) if data on each proc => S=t s /t p =O(p). t p = O(n + n/p) if data needs broadcasting => S=t s /t p =o(1). Sequential Recursion. Parallel Recursion. t comm = O(n/2 +n/4 + ..+ n/p) = O(n). S=o(1).
E N D
Adding numbers n data items, p processors ts = O(n) tp = O(n/p) if data on each proc => S=ts/tp=O(p) tp = O(n + n/p) if data needs broadcasting => S=ts/tp=o(1)
Parallel Recursion tcomm = O(n/2 +n/4 + ..+ n/p) = O(n) S=o(1) tcomp = O(n/2 +n/4 + ..+ n/p) = O(n)
tcomm = O(1 +1 + ..+ 1) = O(log p) S=O(n / log p) tcomp = O(1 +1 + ..+ 1) = O(log p)
Sequential m buckets , n numbers ts = O(n + m((n/m) log (n/m))) = O(n log(n/m))
m buckets , n numbers, p=m processors tp = O(n + (n/p) log (n/p))
tp = O(n/p + (n/p) log (n/p)) = O( (n/p) log (n/p)) => S=O(p)
Det. Sample Sort • sort locally and create p-sample
Det. Sample Sort • send all p-samples to processor 1
Det. Sample Sort • proc.1: sort all received samples and compute global p-sample
Det. Sample Sort • broadcast global p-sample • bucket locally according to global p-sample • send bucket i to proc.i • resort locally
Det. Sample Sort Lemma: Each proc. receives at most 2 n/p data items n/p2 n/p2 global sample global sample
Det. Sample Sort Post-Processing: “Array Balancing” n/p n/p n/p n/p n/p n/p n/p n/p 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 2 Rounds: • Each proc. sends rec. data size to all other proc. • Move data to right location via one h-relation
Det. Sample Sort • 5 MPI_AlltoAllv for n/p > p2 • O(n/p log n) local comp. • Goodrich (FOCS'98): O(1) rounds for n/p > pe
static assignment of processors to segments of [a,b] area = d (f(p)+f(q))/2
Adaptive Quadrature Terminate when C is sufficiently small Problem: different parts of the curve need different resolution
segment 1 segment 3 segment 4 segment 2 segment 5
for each time step: for each object: traverse tree to determine its forces Problem:traversals have different lengths
object 1 object 3 object 5 object 2 object 4