180 likes | 324 Views
A Method for Fast Delay/Area Estimation. EE219b Semester Project Mike Sheets May 16, 2000. Overview. Problem statement Proposed solution Constant delay paradigm Zero-slack algorithm Implementation Incorporation into SIS Library characterization Results Conclusions Future Work.
E N D
A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000
Overview • Problem statement • Proposed solution • Constant delay paradigm • Zero-slack algorithm • Implementation • Incorporation into SIS • Library characterization • Results • Conclusions • Future Work
Problem Statement • Given a boolean network, estimate the area if implemented with particular required time constraints • Estimation should be fast and reasonably accurate • Examine how technology independent logic optimization affects the estimation
Area/Delay Models • Constant area (traditional) model • Composed of discretely sized gates with constant area • Mapping involves calculating delay as a function of load • Constant delay model • Composed of mathematical functions relating area to size • Mapping involves calculating size (area) as a function of load Constant Area Model Constant Delay Model ND2X1 ND2 CL CL Area = constant from library Size = constant from library Delay = dint + k*CL Area = Aint + Aslope*size Size = k*CL /(Delay – dint) Delay = constant
Zero Slack Algorithm Given input arrival times {ai} and output required time {rk}, assign gate delays as follows: • Initialize all internal required/arrival times to “unknown” • Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the path in number of gates • For each node from primary inputs to primary outputs • Calculate all the (ai, li) pairs from all fanin edges • Discard dominated pairs, save the union of the undominated pairs • When all primary outputs are reached, calculate minimum (rk-ai)/lp • Assign delay of each gate in the selected path(s) to this minimum • Update arrival and required times for all fi and fo edges of newly assigned delays • Repeat steps 2-4 until all gates are assigned delays Pair domination defined: a1 r3 Pair (ai, li) dominates (aj, lj) if ai aj and li lj If either (a1, l1) or (a2, l2) dominates the other, the four possible paths through n can be reduced to two, since the dominated path is “faster” than necessary. n1 n3 l1 l3 n a2 r4 l2 l4 n2 n4
Faster Approximation Select an allowable slack threshold sthresh (if zero then algorithm yields same result as previous) • Compute the forward level lj and arrival time aj of all nodes in network using a forward trace • Compute the reverse level kj and required time rj of all nodes in network using a backward trace • Update the delay of every node as dj = dj + (rj-aj)/(lj+kj) • While the slack of any node exceeds sthresh then repeat steps 1-3.
Incorporation into SIS BLIF net. read_blif Tech. independent optimization: script.algebraic, script.boolean, etc Tech. lib. Tech. dependent optimization: map read_library Area Manual analysis Est. lib. Area/delay tradeoff curve read_estim Fast delay/area estimation: estimate
Library Characterization • Commercial standard cell library have possibly multiple gates that implement the same equation • Each gate in the library has characteristics: • Size • Delays from all input pins to the output pin for all transitions and several loads • Capacitance for all input pins • Maximum load • Area • We need estimation parameters for each class of gates (ie. gates with the same equation): • Intrinsic gate delay (dint) • Drive factor (k) • Area line y-intercept (Aint) • Area line slope (Aslope) • Input capacitance line y-intercept (cint) • Input capacitance line slope (cslope)
Inverter Characterization (1) • Inverter delay scales linearly with load/size • Slope is k • Y-intercept is dint
Inverter Characterization (2) • Inverter area scales linearly with size • Slope is Aslope • Y-intercept is Aint
Characterization Issues • Requires at least two gates per class in the library • Additionally, some gates have poor accuracy (trend lines have poor coefficients of determination) • Further research shows the reason is CMOS implementation (below) • Future work might replace linear model with piece-wise linear model for more accuracy NAND-gate CMOS schematic for smaller sizes NAND-gate CMOS schematic for larger sizes
Estimation Library • These issues are evident in the table • OAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same area • NOR3, NOR4 had poor coefficients of determination • Many gates in the library had only one size
Estimation Modes • Sweep mode • User specifies a range of required times to sweep (possibly only one) and a step size • Estimation starts with the largest required time and steps down until network fails the zero slack algorithm (ie. negative slack is encountered) • Binary search mode • Used to find the minimum possible required time (period) given infinite area • Starts at a user-specified maximum and performs a binary search until a pass limit is reached
Experimentation • Various sized combinational logic benchmarks • MCNC c17, c880, c1908, c3540 • Various sized sequential logic benchmarks • Interpretation of required time is clock period (assuming all flip-flops are clocked synchronously) • MCNC s713, s838, s953, s1196, s1238, s1423 • Tested four scripts • script.none (no optimization), script.algebraic, script.boolean, script.rugged
Tradeoff Curves • Sweep mode allows multiple required times (clock periods) to be easily tabulated
Sensitivity to Optimization Script • When delay is non-critical (ie. as required time approaches infinity) • Area within 20% of no optimization • Variation between optimization scripts mostly under 10%
Conclusions • Sometimes more optimization yields worse results • As required times become smaller, more paths become critical requiring larger sizes (area) • Area increases quickly before failure • From the benchmarks shown, estimation is relatively insensitive to technology independent optimization with infinite required times
Possible Future Work • Accuracy • Relate estimated areas to actual areas from a good mapping using the full technology library • Use more complex delay equations to handle different rise/fall times • Modify the algorithm to handle the case where a primary input cannot drive the required load • Characterization • Revise characterization to support piece-wise linear functional forms • Automate process so only the actual technology library is required as an input • Mapping • Examine how various mapping options affect estimation • Use buffered fanout trees (Touati) after sizing gates • Speed • Compare speed of total estimation procedure to traditional flow • Power estimation