340 likes | 557 Views
Parallel Patterns Reduce & Scan. Programming Patterns For Parallelism. Some patterns repeat in many different contexts e.g. Search an element in an array Identifying such patterns important Solve a problem once and reuse the solution Split a hard problem into individual problems
E N D
Parallel Patterns - Reduce & Scan Parallel PatternsReduce & Scan
Parallel Patterns - Reduce & Scan Programming Patterns For Parallelism • Some patterns repeat in many different contexts • e.g. Search an element in an array • Identifying such patterns important • Solve a problem once and reuse the solution • Split a hard problem into individual problems • Helps define interfaces
Parallel Patterns - Reduce & Scan We Have Already Seen Some Patterns
Parallel Patterns - Reduce & Scan We Have Already Seen Some Patterns • Divide and Conquer • Split a problem into n sub problems • Recursively solve the sub problems • And merge the solution • Data Parallelism • Apply the same function to all elements in a collection, array • Parallel.For, Parallel.ForEach • Also called as “map” in functional programming
Parallel Patterns - Reduce & Scan Map • Given a function f : (A) => B • A collection a: A[] • Generates a collection b: B[], where B[i] = f( A[i] ) • Parallel.For, Paralle.ForEach • Where each loop iteration is independent A f f f f f f f f B
Parallel Patterns - Reduce & Scan Reduce And Scan • In practice, parallel loops have to work together to generate an answer • Reduce and Scan patterns capture common cases of processing results of Map
Parallel Patterns - Reduce & Scan Reduce And Scan • In practice, parallel loops have to work together to generate an answer • Reduce and Scan patterns capture common cases of processing results of Map • Note: Map and Reduce are similar to but not the same as MapReduce • MapReduce is a framework for distributed computing
Parallel Patterns - Reduce & Scan Reduce • Given a function f: (A, B) => B • A collection a: A[] • An initial value b0: B • Generate a final value b: B • Where b = f(A[n-1], … f(A[1], f(A[0], b0)) ) A b b0 f f f f f f f f
Parallel Patterns - Reduce & Scan Reduce • Given a function f: (A, B) => B • A collection a: A[] • An initial value b0: B • Generate a final value b: B • Where b = f(A[n-1], … f(A[1], f(A[0], b0)) ) • Only consider where A and B are the same type A b b0 f f f f f f f f
Parallel Patterns - Reduce & Scan Reduce B acc = b_0; for( i = 0; i < n; i++ ) { acc = f( a[i], acc ); } b = acc; A b b0 f f f f f f f f
Parallel Patterns - Reduce & Scan Associativity of the Reduce function • Reduce is parallelizable if f is associative f(a, f(b, c)) = f(f(a,b), c) • E.g. Addition : (a + b) + c = a + (b + c) • Where + is integer addition (with modulo arithmetic) • But not when + is floating point addition
Parallel Patterns - Reduce & Scan Associativity of the Reduce function • Reduce is parallelizable if f is associative f(a, f(b, c)) = f(f(a,b), c) • E.g. Addition : (a + b) + c = a + (b + c) • Where + is integer addition (with modulo arithmetic) • But not when + is floating point addition • Max, min, multiply, … • Set union, intersection,
Parallel Patterns - Reduce & Scan We can use Divide and Conquer • Reduce(f, A[1…n], b_0) = f ( Reduce(f, A[1..n/2], b_0), Reduce(f, A[n/2+1…n], I) ) where I is the identity element of f A f f f f f f f f b0 I b f
Parallel Patterns - Reduce & Scan Implementation Optimizations • Switch to sequential Reduce for the base k elements • Do k way splits instead of two way splits • Maintain a thread-local accumulated value • A task updates the value of the thread it executes in
Parallel Patterns - Reduce & Scan Implementation Optimizations • Switch to sequential Reduce for the base k elements • Do k way splits instead of two way splits • Maintain a thread-local accumulated value • A task updates the value of the thread it executes in • Requires that the reduce function is also commutative f(a, b) = f(b, a)
Parallel Patterns - Reduce & Scan Implementation Optimizations • Switch to sequential Reduce for the base k elements • Do k way splits instead of two way splits • Maintain a thread-local accumulated value • A task updates the value of the thread it executes in • Requires that the reduce function is also commutative f(a, b) = f(b, a) • Thread local values are then merged in a separate pass
Parallel Patterns - Reduce & Scan Scan • Given a function f: (A, B) => B • A collection a: A[] • An initial value b0: B • Generate a collection b: B[] • Where b[i] = f(A[i-1], … f(A[1], f(A[0], b0)) ) A b0 f f f f f f f f
Parallel Patterns - Reduce & Scan Scan B acc = b_0; for( i = 0; i < n; i++ ) { acc = f( a[i], acc ); } A b0 f f f f f f f f
Parallel Patterns - Reduce & Scan Scan is Efficiently Parallelizable • When f is associative
Parallel Patterns - Reduce & Scan Scan is Efficiently Parallelizable • When f is associative • Scan(f, A[1..n], b_0) = Scan(f, A[1..n/2], b_0), Scan(f, A[n/2+1…n], ____) A ? b0 f f f f f f f f
Parallel Patterns - Reduce & Scan Scan is Efficiently Parallelizable • When f is associative • Scan(f, A[1..n], b_0) = Scan(f, A[1..n/2], b_0), Scan(f, A[n/2+1…n], Reduce(f, A[1..n/2], b_0)) A ? b0 f f f f f f f f
Parallel Patterns - Reduce & Scan Scan is useful in many places • Radix Sort • Ray Tracing • …
Parallel Patterns - Reduce & Scan Scan is useful in many places • Radix Sort ( ) • Ray Tracing • …
Parallel Patterns - Reduce & Scan Computing Line of Sight • Given x1, … xn with altitudes a[1],…a[n] • Which of the points are visible from x0
Parallel Patterns - Reduce & Scan Computing Line of Sight • Given x0, … xn with altitudes alt[0],…alt[n] • Which of the points are visible from x0 • angle[i] = arctan( (alt[i] – alt[0]) / i ) • xi is visible from x0 if all points between them have lesser angle than angle[i]
Parallel Patterns - Reduce & Scan Solution
Parallel Patterns - Reduce & Scan Radix Sort
Parallel Patterns - Reduce & Scan Radix Sort
Parallel Patterns - Reduce & Scan Radix Sort
Parallel Patterns - Reduce & Scan Radix Sort
Parallel Patterns - Reduce & Scan Basic Primitive: Pack • Given an array A and an array F of flags • A = [5 7 2 4 5 3 1] • F = [1 1 0 0 1 1 1] • Pack all elements with flag = 0 before elements with flag = 1 • A’ = [2 4 5 7 5 3 1]
Parallel Patterns - Reduce & Scan Solution
Parallel Patterns - Reduce & Scan Other Applications of Scan • Radix Sort • Computing Line of Sight • Adding multi-precision numbers • Quick Sort • To search for regular expressions • Parallel grep • …
Parallel Patterns - Reduce & Scan High Level Points • Minimize dependence between parallel loops • Unintended dependences = data races • Next lecture • Carefully analyze remaining dependences • Use Reduce and Scan patterns where applicable