250 likes | 282 Views
Learn about Parallel Prefix computation, its applications in EDA, and how to optimize designs using associative operators in O(log(N)) time. Explore techniques such as Carry-Lookahead Adder and Prefix Sum. Discover how to perform seemingly sequential operations in parallel and find the first one in Arbitration. Dive into Prefix-OR, Prefix-AND, Prefix-MAX, and Prefix-MIN operations for enhanced efficiency.
E N D
CS137:Electronic Design Automation Day 6: April 17, 2002 Parallel Prefix
Today • Parallel Prefix • Sample Applications
Key Result • Can compute cascaded result sequence on any associative operator • In O(log(N)) time • With O(N) hardware
Familiar Instance • Carry-Lookahead Adder is a special case of this general result
CLA • Observation: • Each bit of adder will do one of three things: • S - Squash the carry: 0,0 • G - Generate a carry: 1,1 • P - Propagate a carry: 0,1 or 1,0
Further • Each continuous sequence will do these same things: • Squash • Generate • Propagate
Combining • And can be computed from the base elements • ? S S • ? G G • S P S • G P G • P P P
Apply Recursively • PG(i,i) = f(A,B) • PG(i,j) = PG(I,k) PG(k,j) • PG(0,1) = PG(0) PG(1) • PG(0,3) = PG(0,1) PG(2,3) • PG(0,7) = PG(0,3) PG(4,7) • … • PG(0,N-1) = PG(0,N/2-1),PG(N/2,N-1) • Cout(N) = Cin(0) PG(0,N-1)
All Carries • Further, once have full tree can compute all prefixes in another log steps • E.g. • PG(0,13) = PG(0,8) PG(9,12) PG(13)
Complete Sum • After 2log(N) time: • Up tree to compute PG’s • Down tree to compute PG(0,m)’s • Compute results in O(1) time • C(m) = Cin PG(0,m) • S(m)=F(A,B,C(m-1))
Associative • Works because associative • Can go ahead and compute PG(N/2,N-1) • Before know PG(0,N/2-1) • Then combine in unit time.
Consequence • Allows us to perform many seemingly sequential operations in parallel
Prefix Sum • Common Operation: • Want B[x] such that B[x]=A[0]+A[1]+…A[x] • For I=0 to x • B[x]=B[x-1]+A[x]
Prefix Sum • Compute in tree fashion • A[I]+A[I+1] • A[I]+A[I+1]+A[I+2]+A[I+3] • … • Combine partial sums back down tree • S(0:7)+S(8:9)+S(10)=S(0:10)
Other simple operators • Prefix-OR • Prefix-AND • Prefix-MAX • Prefix-MIN
Find-First One • Useful for arbitration • Finds first (highest-priority) requestor • Also magnitude finding in numbers • How: • Prefix-OR • Locally compute X[I-1]^X[I] • Flags the first one
Arbitration • Often want to find first M requestors • E.g. Assign unique memory ports to first M processors requesting • Prefix-sum across all potential requesters • Counts requesters, giving unique number to each • Know if one of first M • Perhaps which resource assigned
Others • Parsing • FSM-state-trace • Recurrence relationships • Rank finding (sorting) • Partitioning • Sorting • Sequential Instruction evaluation (Ultrascalar) • Saturating Accumulation (kp)
FSM-trace • Build composite FSM • I.e. view FSM as F(state)state • Compute new transition functions • FSM[i,j](state) • Give input state at step I, compute output of FSM after step j • FSMs accept regular languages so works for regular expression parsers
Rank Finding • Looking for I’th ordered element • Do a prefix-sum on high-bit only • Know m=number of things > 01111111… • High-low search on result • I.e. if number > I, recurse on half with leading zero • If number < I, search for (I-m)’th element in half with high-bit true • Find median in log2(N) time
Partitioning • Use something to order • (Like we’re thinking about) • Parallel prefix on area of units • If not all same area • Know where the midpoint is
Channel Width • Prefix sum on delta wires at each node • To compute net channel widths at all points along wire
Variations • Segmented • Cyclic Segmented
Big Ideas • Any associative operation can be made parallel