390 likes | 404 Views
This research paper by Mary Sheeran at Chalmers University of Technology focuses on generating fast multipliers using clever circuits. The paper introduces a functional language for describing hardware circuits, emphasizing connection patterns and allowing users to write circuit generators. The paper also highlights the use of clever circuits to control the presence or absence of components and the shaping of circuit wiring. The paper explores the structure and layout of the multiplier circuit, including the reduction tree and the use of clever circuits to adapt to delays and wiring constraints. The paper concludes with potential future work and applications of the research.
E N D
Generating Fast Multipliers using Clever Circuits Mary Sheeran Chalmers University of Technology Research funded by SRC in an Intel-custom project, and by Vetenskapsrådet
Using a functional language to describe hardware Gives a style of circuit description and analysis Emphasises connection patterns User writes circuit generators
Interleave f f ilv f unriffle ->- two f ->- riffle
Butterfly bfly (n-1) circ bfly (n-1) circ
Defining Butterfly bfly 0 circ = id bfly n circ = ilvN (n-1) circ ->- two (bfly (n-1) circ) two copies of smaller butterfly circuit
BUT High performance data paths are in reality NOT regular! Start out regular and become less so as design proceeds -- end with analogue design of each instance of each cell! ”It’s all in the wires”
Shadow Values gen. bfly bafter Info. about what is bigger/smaller (98 comparators) updated by components (dynamic) Only necessary sub-sorters included
in1 a1 Clever Circuits decide what component to be based on on shadow values produced when a particular component is used Try it and see during generation
Clever circuits give control over Presence or absence of components (Charme03) Shape of circuit wiring (this paper) Circuit topology (next paper)
Multiplication 11010 01001 11010 00000 00000 11010 00000 0011101010
Multiplication msb 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0
Multiplication lsb 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0
multBin comps (as,bs) = p1:ss where ([p1]:[p2,p3]:ps) = prods_by_weight (as,bs) is = redArray comps ps ss = binaryAdder ([p2,p3]:is) redArray comps ps = is where (is,[]) = row (compress comps) ([],ps)
Reduction tree for multiplier 5 4 4 3 3 carries 2 Fast Adder
Will concentrate on the reduction tree (a row of compress cells) Assume partial products already generated (e.g. using and gates). May also include recoding to reduce size of tree (cf. Booth)
f-cell n Compress (diff=2) n-2 2
diff > 2 diff < 2 k k wcell hcell k+2 k-1
compress comps (as,bs) | (diff > 2) = (compress comps |- hcell comps) (as,bs) | (diff == 2) = column (fcell comps) (as,bs) | (diff < 2) = (compress comps -| wcell comps) (as,bs) where diff = length bs - length as
possible fcell c fullAdd s halfAdd cells similar. Gives standard array multiplier. Not great!
fullAdd c s Only need to vary wiring!Make it explicit iC s3 cc iS
Dadda-like c fullAdd toEnd (a,as) = as++[a] s Excellent log depth reduction tree , but known for irregularity, difficult layout
Delay model for half adder halfAddI (as, bs, ac, bc) [a,b] = [s,cout] where s = max (as+a) (bs+b) cout = max (ac+a) (bc+b) as is delay between a input and sum output etc. hI as = halfAddI (10,10,5,5) as fI as = fullAddI (20,20,10,10,10,10) as
comps, tuple of building blocks Checking gate delay dDadG n = simulate (redArray (hI,fI,toEnd,toEnd,id,sep2,sep3)) (ppzs n) Gate delay models wiring cells (allow . inclusion of wiring delay) Main> dDadG 16 [[0,10],[5,20],[20,30],[30,40],[40,50],[50,50],[50,60],[60,70],[70,70], [70,70],[70,80],[70,80],[80,90],[90,90],[90,90],[90,90],[90,90],[90,90], [80,90],[80,80],[70,80],[70,80],[70,70],[60,70],[60,60],[50,60],[50,50], [40,20],[0,20]]
Promising, but we can do better! Choose what wiring cells to use dynamically, during circuit generation, rather than in advance Base choice on delay behaviour of both wires and components
cleverInsert s3 c fullAdd cc s cleverInsert Idea: Harden the wiring during circuit generation using clever circuits. Shadow values estimate delay through wires and cells.
cswap((a,x),(b,y)) = if (x>y) then ((b,y),(a,x))else((a,x),(b,y))
cleverInsert = row cswap ->- apr forms necessary wiring based on context (delays on shadow wires)
Structure of circuit generator remains unchanged adapt (hAdd, fAdd, cc) (d,pds) = mmark pds ->- redArray (hAdd // hIB, fAdd // fIB, Haskell level circuit level cInsert, cInsert, cc // cross d, sep2, sep3) ->- unmark
Result (multiplication) Simple parameterised description of fast adaptive multiplier. Like Three Dimensional Method except that wire-length, and not only gate-delay is taken into account in choosing which connections to make Promises to perform well (better than modified Dadda and TDM)
Result (multiplication) Adaption to incoming delay profile can be arranged (clever circuits again). Can also easily adapt description to take account of limitations on cross-cell tracks (see paper) Much remains to be done (e.g. insertion of buffers, fine delay modelling, transistor sizing, other layouts, the rest of the multiplier...).
Result (general) Non-standard interpretation used after generation (as we have long done) and now also to guide synthesis. Circuit generators short and sweet and LOOK LIKE circuit descriptions. High degree of parameterisation. Application areas? Module generation for full custom / SoC / FPGA Ideas are completely compatible with Intel’s IDV system (see talk by Greg Spirakis at this conference)
Result (general) Clever circuits a good idiom. Can control choice of components, wiring and topology. Greatly increase expressive power of the connection patterns approach. Gives a way to allow non-functional properties to influence design (even early on) Vital as we move to deep sub-micron Separation of concerns becoming less and less possible
Formal Verification?? Have verified small-sized versions of multipliers (Bjesse, Synopsys) Should verify generators (see Hunt’s seminal work) Investigating generation of FOL for verification of Haskell programs (Cover project at Chalmers)
What next? Want to go the whole hog and generate layouts for high performance arithmetic circuits from Wired Need help with the formal verification of generators And it is time to return to refinement