150 likes | 309 Views
Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design. Hao Yu Berkeley Design Automation hao.yu@berkeley-da.com. Chunta Chu and Lei He EE Department UCLA. The work was performed at UCLA and was partially supported by NSF and UC-MICRO.
E N D
Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design • Hao Yu • Berkeley Design Automation hao.yu@berkeley-da.com • Chunta Chu and Lei He EE Department UCLA The work was performed at UCLA and was partially supported by NSF and UC-MICRO
Decap Allocation for Clean Power Delivery c c decap • Chip-package co-design requires a noise-free off-chip power delivery system (PDS) • Modeling inductance is a must • Decoupling capacitors (decaps) are allocated on chip-package interface to satisfy power integrity • It is a challenging task tofind a fast yet accuratedecap allocation for a large-scale design How to consider the large and complex physical-level layout during the system-level design?
Module 2 Module 1 Physical Level Challenge • Finite parastic impedance affects the circuit functionality at chip-package interface • Supply volatage drop and electromagnetic (EM) coupling • Distributed post-layout model burdens the system-level power integrity analysis and design • Millions of nodes and terminals with dense inductances
The Need of Macromodeling • Representing a large and complex power delivery system blindly leads to expensive design cycles • A compact representation by macromodeling is needed • Existing decap allocation methods with macromodeling [Zheng:CICC’04, Chen:ISPD’06] • Generate PDS macromodel • Apply simulated annealing to add/remove one decap to alegal position • Can not efficiently handle alarge-scale design
Small but dense Limitations of Existing Macromodeling • Macromodeling algorithms [PVL, PACT, PRIMA] are limited to handle a large-scale PDS • Become ineffective when terminal number is large • Do not provide the sensitivity information • Destroy the structure of state matrix How to use it ? project
Our Decap Problem Formulation • A multiple-ring-based problem formulation • Represent decap solutionby combination of multi-level templates • Constrain by noise integral at I/O instead of noise amplitude in [ Chen:ISPD’06] • Optimization Method • Each step inserts a template with a given decap type based on sensitivity instead of simulated-annealing The key is to efficiently calculate sensitivity from macromodel
A structured and parameterized macromodel connects layout with system TBS2: Macromodeling for PDS • Principle Terminal Selection • Capture the essential input/output behavior • Parameterization • Compute performance sensitivities from the layout modifications • Structured Simulation • Sparsely arrange couplings (sparsity), leverage diverse physical domains (latency) and analyze at block-levels (hierarchy)
TBS2 (1) Principle Terminal Selection • The input signals (J =B x I) are temporally correlated • Described by a correlation matrix C (N x N) • Correlated terminals [b0 b1b2] can be simplified with use of a principal component analysis (PCA) • Select K principle terminals by K-means method
Total M1XM2 types of parameterized templatesdescribed by a parameterized state matrix in s-domain 1 2 3 4 5 6 7 8 8 4 3 7 0 1 - 1 1 5 6 2 0 0 1 2 3 4 5 6 7 8 0 1 -1 0 0 TBS2 (2) Parameterization • Decaps can be parametrically described by • The sizing vector (D) for M2 types of decaps and the topological matrix (X) for M1 levels of rings X(2,6)=
Structured projection Sparse and block-triangular TBS2 (3) Structured Macromodeling Block-wise nominal and sensitivity Details can be found in TBS1 [Yu:DAC’06] and [Yu:ISLPED’06]
A non-uniform RLC mesh is reduced by an 80th-order reduction using TBS2 and PRIMA • TBS2 matchesmorepoles than PRIMA w.r.t principle terminals • The waveform accuracy is improved in both frequency/time domain by TBS2 Improved Accuracy By TBS2 Reduction
Our Decap Algorithm Overview • Apply TBS2 just one-time to generate a structured and parameterized macromodel • Calculate block-level nominal noise at each terminal and its sensitivity w.r.t the partitioned template • Check if noise integral satisfies constraints • Allocate decaps for each block according to the sensitivity in a greedy fashion Calculate nominal+ sensitivity update Template Check Constraints TBS2
Comparing three methods: • 1) Simulated-annealing with noise amplitude [Chen:ISPD’06];2) Multiple-ring with noise amplitude [this paper];3) Multiple-ring with noise integral [this paper] • MRA-NI is up to 97X faster than SA-NA due to structured and- parameterized macromodel from TBS2 • MRA-NI reduces decap cost by up to 16% due to a more accurate integrity metric using noise integral Reduced Runtime and Cost of Decap Allocation
Macromodel connects the system-level design with the physical-level layout • TBS2: Structured and parameterized macromodel • Provide a fast yet accurate computational prototyping for large/complex system • Solve an integrity-driven decap allocation for chip-package co-design • Such a block-wise macromodel and optimization can be applied to other layout optimization problems Conclusions