170 likes | 338 Views
Hierarchical Physical Design Methodology for Multi-Million Gate Chips Session 11. Wei-Jin Dai. Overview. Introduction Challenges of hierarchical design Hierarchical methodology – Full chip physical prototyping Performance data Summary. Introduction.
E N D
Hierarchical Physical Design Methodology for Multi-Million Gate ChipsSession 11 Wei-Jin Dai
Overview • Introduction • Challenges of hierarchical design • Hierarchical methodology – Full chip physical prototyping • Performance data • Summary
Introduction • As chip size and complexity grow, hierarchical design approach is necessary • During last 12 months, there is a big increase in the number of chips designed with hierarchical approach • The advantages of hierarchical approach is divide-and-conquer
The Challenges • How to get full-chip (10 million gates+) physical reality early on to identify potential problems? • How to have convergence process to reach design closure from beginning to end? • How to achieve die utilization similar to “flat” approach? • How to achieve clock speed and skews similar to “flat” approach? • How to automatically generate optimal pin assignments for each module? • How to automatically come up with realistic timing budgets for each module? • How to achieve top level timing/signal integrity closure?
Creating the Physical Prototype Flat Full-Chip Delivers an Accurate Physical Prototype • Full-chip flat prototype delivers the complete physical, timing, clock and power data • Eliminates the guessing of the traditional block-based approaches • Drives the partitioning in manageable blocks
Design Completion P r o t o t y p i n g Prototyping Starts Early in the Flow RTL/ Black box 75% netlist/ Black box Complete netlist • Most accurate view possible at all design stages • Physical timing budgeting drives synthesis Optimization Estimation Refinement Initial timing budgets Refined timing budgets
Hierarchical Design Flow LEF/GDSII RTL/Black Box Process Data • Quick synthesis • Floor planning • Placement • CTS • Trial route Flat Full Chip Physical Prototype Chip Level Timing Constraints • Die size • Timing • Clock skew • Power • SI Physically Feasible? NO • Pin assignment • Timing budget • Clock spec • Power grid Partition Data Partition Data Partition Data Partition Data Partition Data Physical Partitioning Block Implementation Place, CTS, Optimize Top Level Implementation CTS, Optimization, Power DEF Placement Optimized Top Level Netlist DEF Placement
Hierarchical Partitioning • Pin assignment • Timing budgeting • Clock tree generation • Power grid planning Independent block-level implementation Partitioning SoC assembly
Accurate Pin Assignment • Full-chip prototype results in optimal pin placement • Results in narrower channels and reduced die size • Reduces the routing congestion • Improves the chip timing Accurate Physical Prototype Flat Full-Chip Top Level Partition View
Block 1 L Block 2 L Block 3 L Timing Budgeting Each block requires: • Clock definition • Set_input_delay • Set_output_delay • Set_drive • Set_load • Path exceptions (false, multicycle paths) Accurate timing budgets result in predictable timing convergence
Balanced clock tree Hierarchical Clock Tree Synthesis • Accurate physical timing data enables the creation of an optimal clock tree • Block-level followed by top-level clock tree • Final clock tree routing generates near zero skew • Balanced tree at the top level 100ps skew 130ps skew 150ps skew Worst block skew + Zero top level skew = 150ps total clock skew 50ps skew 50ps skew 120ps skew
Hierarchical Power Grid Design • P/G are planned at full chip level • P/G network gets automatically pushed down during partitioning Block Full chip
Design 580K cells, 0.25um process, 5LM, 100MHz Data collected on a 500MHz processor workstation (*) SPC Trial Route High Performance Environment First Encounter Traditional 5 hr 25 min 7 hr 30 min 9 hr 35 hr 40 min 5 hr 45 min 3 hr 50 min 2 hr 50 min 1 hr 50 min 3 hr 20 min 6x 2 hr 15 min 4 hr 20 min 1x 5x 4 min 8 min 6 min 7 min 7x 60x 56x 57x 33x Design Import Detail Place Detail Route* RC Extract Delay Calculation Timing Analysis Design Iteration IPO
Design: • 5LM • 0.25um • 580K cells • 620K nets • 572 I/Os • 4 blocks High Accuracy of the Prototype • The prototype closely correlates with post-route layout • Comparison to ‘tape-out’ back-end flow • More than 90% of the interconnect and IO path delays within 2%
SummarySoC Hierarchical Methodology • Build a full-chip physical prototype early on • Start at RTL • Identify problems early • Achieve design closure before partitioning • Close full-chip timing • Optimize die size • Meet power requirements • Resolve signal integrity issues • Maintain the design closure throughout the design process