320 likes | 329 Views
This paper presents a simulated evolution algorithm for multi-objective VLSI netlist bi-partitioning, aiming to optimize power consumption, delay, and cutset with a balanced constraint. Experimental results show the effectiveness of the proposed approach.
E N D
Simulated Evolution Algorithm for Multi-Objective VLSI Netlist Bi-Partitioning Sadiq M. Sait, Aiman El-Maleh, Raslan Al Abaji King Fahd University of Petroleum & Minerals Dhahran, Saudi Arabia http://www.kfupm.edu.sa 27th May, ISCAS-2003, Bangkok, Thailand
Outline • Introduction • Problem Formulation • Cost Functions • Proposed Approaches • Experimental results • Conclusion
7.5M333MHz0.25um 3.3M200MHz0.6um 1.2M50MHz0.8um Design Characteristics 0.13M12MHz1.5um 0.06M2MHz6um Cycle-basedsimulation,FormalVerification Top-DownDesign,Emulation HDLs, Synthesis CAESystems, Siliconcompilation Key CAD Capabilities SPICE Simulation VLSI Technology Trends The challenges to sustain such a fast growth to achieve giga-scale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology.
Technology 0.1 um Transistors 200 M Logic gates 40 M Size 520 mm2 Clock 2 - 3.5 GHz Chip I/O’s 4,000 Wiring levels 7 - 8 Voltage 0.9 - 1.2 Power 160 Watts Supply current ~160 Amps The VLSI Chip in 2006 Performance Power consumption Noise immunity Area Cost Time-to-market Tradeoffs!!!
VLSI Design Cycle VLSI design process is carried out at a number of levels. • System Specification • Functional Design • Logic Design • Circuit Design • Physical Design • Design Verification • Fabrication • Packaging Testing and Debugging
Physical Design • The physical design cycle consists of: • Partitioning • Floorplanning and Placement • Routing • Compaction Physical design converts a circuit description (behavioral/structural), into a geometric description. This description is used to manufacture a chip.
Why do we need Partitioning ? • Decomposition of a complex system into smaller subsystems. • Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach). • Dividing a complex IC into a number of functional blocks, each of them designed by one or a team of engineers. • The partitioning scheme has to minimize the interconnections between subsystems.
Levels of Partitioning System System Level Partitioning PCBs Board Level Partitioning Chips Chip Level Partitioning Subcircuits / Blocks
Classification of Partitioning Algorithms Partitioning Algorithms Group Migration Iterative Heuristics Performance Driven Others • Spectral • Multilevel Spectral • Lawler et al. • Vaishnav • Choi et al. • Jun’ichiro et al. • Kernighan-Lin • Fiduccia-Mattheyeses (FM) • Multilevel K-way Partitioning • Simulated annealing • Simulated evolution • Tabu Search • Genetic
Motivation • Need for Power optimization • Portable devices • Power consumption is a hindrance in further integration • Increasing clock frequency • Need for delay optimization • In current sub micron design wire delay tend to dominate gate delay. • Larger die size imply long on-chip global routes, which affect performance • Optimizing delay due to off-chip capacitance
Objective • Design a class of iterative algorithms for VLSI multi-objective partitioning. • Explore partitioning from a wider angle and consider circuit delay, power dissipation and interconnect in the same time, under a given balance constraint Objectives • Power cost is optimized • Delay cost is optimized • Cutset cost is optimized Constraint • Balanced partitions to a certain tolerance degree (10%)
Problem formulation • The circuit is modeled as a hypergraph H(V,E), where V ={v1,v2,v3,… vn}is a set of modules (cells). • And E = {e1, e2, e3,… ek} is a set of hyperedges. Being the set of signal nets, each net is a subset of V containing the modules that the net connects. • A two-way partitioning of a set of nodes V is to determine two subsets VA and VB such that VA U VB = V and VAVB =
cutset = 3 Cutset • Based on hypergraph model H = (V, E) • Cost 1: c(e) = 1 if e spans more than 1 block • Cutset = sum of hyperedge costs • Efficient gain computation and update
Delay Model path : SE1 C1C4C5SE2. Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2 CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
Power The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by: Ni is the number of output gate transition per cycle (Switching Probability) load capacitance = Load Capacitances before Partitioning + load due to off chip capacitance Total Power dissipation of a Circuit:
Unifying Objectives by Fuzzy logic Weighted Sum Approach • Problems in choosing weights. • Need to tune for every circuit. • Imprecise values of the objectives • best represented by linguistic terms that are basis of fuzzy algebra • Conflicting objectives • Operators for aggregating function
Fuzzy logic for Multi-objective function • The cost to membership mapping • Linguistic fuzzy rule for combining the membership values in an aggregating function • Translation of the linguistic rule in form of appropriate fuzzy operators • And-like operators: Min operator = min (1, 2) • And-like OWA: = * min (1,2) + ½ (1-) (1+ 2) • Or-like operators Max operator = max (1, 2) • Or-like OWA: = * max (1,2) + ½ (1-) (1+ 2) • Where is a constant in range [0,1]
Membership functions Where Oiand Ciare lower bound and actual cost of objective “i” i(x) is the membership of solution x in set “good ‘i’ ” giis the relative acceptance limit for each objective.
Fuzzy linguistic rule A good partitioning can be described by the following fuzzy rule IF solution has small cutsetAND low powerAND short delay AND good Balance. THENit is a good solution
Fuzzy cost function The above rule is translated to AND-like OWA Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness Respectively (Cutset, Power, Delay, Balance) Fitness
Simulated Evolution Algorithm Simulated evolution Begin Start with an initial feasible Partition S Repeat Evaluation :Evaluate the Gi (goodness) of all modules Selection : For each Vi (cell) DO begin if Random Rm > Gi then select the cell End For Allocation:For each selected Vi (cell) DO begin Move the cell to destination Block. End For Until Stopping criteria is satisfied. Return best solution. End
Cut goodness di: set of all nets, Connected and not cut. wi : set of all nets, Connected and cut.
Power Goodness Vi is the set of all nets connected and Ui is the set of all nets connected and cut.
Delay Goodness Ki: is the set of cells in all paths passing by cell i. Li: is the set of cells in all paths passing by cell i and are not in same block as i.
Final selection Fuzzy rule IF Cell ‘i’ is near its optimal Cut-set goodness as compared to other cells AND AND THEN it has a high goodness. near its optimal power goodness compared to other cells near its optimal net delay goodness as compared to other cells OR T(max)(i) is much smaller than Tmax
Fuzzy Goodness Tmax :delay of most critical path in current iteration. T(max)(i) :delay of longest path traversing cell i. Xpath= Tmax / T(max)(i) Respectively (Cutset, Power, Delay ) goodness.
Experimental Results ISCAS 85-89 Benchmark Circuits
SimE versus Tabu Search & GA against time Circuit: s13207
SimE results were better than TS and GA, with faster execution time. Experimental Results: SimE versus TS and GA
Conclusion :Re-write this • The present work successfully addressed the important issue of reducing power and delay consumption in VLSI circuits. • The present work successfully formulate and provide solutions to the problem of multi-objective VLSI partitioning. • TS partitioning algorithm outperformed GA in terms of quality of solution and execution time.