300 likes | 333 Views
Power-Aware Placement. Yongseok Cheon, Pei-Hsin Ho Advanced Technology Group, Synopsys, Inc. {cheon,pho}@synopsys.com Andrew B. Kahng, Sherief Reda and Qinke Wang UCSD CSE Department {abk,sreda,qiwang}@cs.ucsd.edu. Outline. Introduction Activity-based register clustering
E N D
Power-Aware Placement Yongseok Cheon, Pei-Hsin Ho Advanced Technology Group, Synopsys, Inc. {cheon,pho}@synopsys.com Andrew B. Kahng, Sherief Reda and Qinke Wang UCSD CSE Department {abk,sreda,qiwang}@cs.ucsd.edu
Outline • Introduction • Activity-based register clustering • Activity-based net weighting • Experiments • Conclusions
IC Power Consumption • Switching power • largest source of power dissipation • usually accounts for 40% to 80% of total power • switching power of a net is proportional to the product of net capacitance and signal switching rate • Short circuit power • power dissipation due to short current that happens briefly during the switching of a CMOS gate • Leakage power • power dissipation due to spurious currents in thenon-conducting state of a transistor
Clock Power Consumption • Clock net • a major contributor to dynamic power • much larger capacitances than most signal nets • highest switching activity • typically consumes up to 40% of total dynamic power across a variety of design types • Traditional placement methodologies treat registers no differently than combinational cells • lead to sub-optimal placements in terms of power
Power Aware Placement Method • Activity-based register clustering • reduce capacitance of clock nets hence clock power • Activity-based net weighting • reduce capacitance of high-activity signal nets hence total net switching power
Outline • Introduction • Activity-based register clustering • Activity-based net weighting • Experiments • Conclusions
Large Weight for Clock Net? • Not a good idea • May only affect registers close to boundaries • Introduce hot spots and highly congested areas
Distribution of Clock Tree Capacitance • Observation: most of the clock tree capacitance (e.g., 80%) is at the leaf level
Register Clustering • Goal: reduce capacitance of a clock net • Method: clumping the registers within the same leaf cluster of the clock tree into a smaller area • Result: reduced leaf-level clock tree capacitance and potentially clock skew
Flow of Register Clustering • Quick CTS algorithm: group registers into clusters such that each cluster can become a leaf cluster of the actual clock tree • Group Bounds: constrain the placement of a cluster of registers within smaller bounding box
Quick Clock-Tree Synthesis Algorithm • Decide a scope of target cluster size heuristically based on • size of the clock net • design rule constraints: max fanout and max load • user configuration • Perform clustering for each direction from left, right, top and down and each target cluster size • Select the clustering with the best CTS objective • e.g., minimum clock skew, minimum clock delay, minimum # clock buffers, etc.
Quick CTS Algorithm (contd) • Start with the leftmost (rightmost, highest or lowest) un-clustered clock pin • Add clock pin with shortest Manhattan distance to the capacitance weighted centroid of the current cluster • Grow until target cluster size • Repeat growing clusters until all done
Group Bounds • Control bounding box of a cluster and reduce it while still fitting the registers • Compute current bounding box of registers • Shrink the bounding box proportionally • Shrink ratio p • specified shrinking factor of p0 • switching rate of clock net SR and max switching rate MSR
Aspect Ratio of Bounding Box • Close to the original bounding box aspect ratio ARold when shrinking ratio p is close to 1 • without serious increasing of signal net length • Close to square when shrinking ratio p is close to 0 • reduced clock skew • Linear function of original aspect ratio ARold and shrink ratio p
Outline • Introduction • Activity-based register clustering • Activity-based net weighting • Experiments • Conclusions
Pros and Cons of Register Clustering • Effectively reduce capacitance of leaf-level clock tree • Increase the length of some signal nets • Cancel out clock power reduction
Activity-Based Net Weighting • Goal: reduce capacitance of signal nets • Assigning larger weight to signal nets with higher switching rates • Combining register clustering and activity-based net weighting further reduces the total net switching power
Activity-Based Net Weighting • Assign larger weights to nets with higher switching rates • T: threshold for selecting high activity nets • MSSR: maximum signal net switching rate • W: controls the scope of power weights
Compatibility with Timing Weights • Linear combination of power and timing net weighting • Power ratio α : 0 ~ 1 • control the ratio of power weight • knob for trade-off between timing and power
Outline • Introduction • Activity-based register clustering • Activity-based net weighting • Experiments • Conclusions
Experimental Setup • Implemented on Synopsys IC compiler • Eight industry circuits: • #cells: 20k ~ 186k • #registers: 2.3k ~ 44.2k • clock power: 32% of total power • net switching power: 39% of total power • Power aware placement • shrink ratio and power ratio around 0.8
Place CTS Route Extract RC STA Power Analysis Experimental Flow • Commercial IC implementation flow • Power analysis: IC Compiler • specified switching rates of primary inputs • net switching rates estimated by probabilistic simulation
11.2% Clock Net Switching Power
25.4% Total Net Switching Power
Summary • Reduction • clock net switching power: 11.3% (1.6% ~ 34.5%) • total net switching power: 25.3% (10.5% ~ 47.1%) • total power: 11.4% (6.5% ~ 18.8%) • clock WL: 10.1% • clock skew: random • Impact • WNS (worst negative slack): 2.0% • total cell area: 1.2% • runtime: 11.5%
Conclusions • We have presented a power-aware placement method that performs activity-based net weighting and register clustering to reduce the capacitance of high-activity signal and clock nets • We have experimented the method on eight real designs through a complete industrial physical design flow • Our approach achieved average 25.3% and 11.4% reduction in net switching and total power, with 2.0% timing, 1.2% total cell area and 11.5% runtime degradation