350 likes | 528 Views
HiCap: A Fast Hierarchical Algorithm for 3D Capacitance Extraction. Weiping Shi Department of Computer Science University of North Texas. Outline. Introduction Previous Research Integral Equation & N-Body Problem New Algorithm Experimental Results Conclusion Future Work. Introduction.
E N D
HiCap: A Fast Hierarchical Algorithm for 3D Capacitance Extraction Weiping Shi Department of Computer Science University of North Texas
Outline • Introduction • Previous Research • Integral Equation & N-Body Problem • New Algorithm • Experimental Results • Conclusion • Future Work
Introduction • Capacitance Extraction: Given a set of conductors in 3-D space, compute the capacitance between all pairs of conductors. 1V + - - + + + C=Q - - + - - -
Signal delay = gate delay + interconnect delay • Interconnect delay is caused by RC (resistance and capacitance) parasitic. R C C
Interconnect delay dominates gate delay in deep sub-micron VLSI. Delay (ps) Generation (micron)
Importance in VLSI • Fast and accurate capacitance extraction is crucial in the design and verification of VLSI circuits and packaging. • Current 3D tools are too slow. • FastCap, Raphael, QuickCap, etc. • 2D/2.5D/Quasi-3D tools use 3D engines to generate library. Accuracy depends on 3D engines. • Dracula, HyperExtract, Arcordia, Fire&Ice, Star-RC, Columbus, etc. • For critical nets and clock trees, 3D accuracy is necessary.
Importance in MEMS • Accurate capacitance extraction of complex 3-D structures is also important in design of MEMS (MicroElectroMechanical Systems). • Design of most motion sensors needs accurate estimate of capacitance. • Design of most drivers needs to solve a similar potential problem. • A recent ARPA report estimates the market of above applications at 1 to 3 billion dollars by 2004.
Previous Research • Differential Maxwell Equation (Finite Difference Method or Finite Element Method) • Raphael Field Solver • Integral Laplace Equation (Boundary Element Method) • Multipole algorithm FastCap by Nabors & White. O(N) time. Kernel dependent. • Pre-corrected FFT algorithm by Phillips & White. O(N log N) time. Kernel independent. • SVD algorithm IES3 by Kapur & Long. O(N log N) time. Kernel independent.
Integral Equation Approach where (x) is the known surface potential, (x’) is the charge density, da’ is an incremental conductor surface area, x’ is on da’, is the kernel.
Partition conductor surfaces into N panels and assume uniform charge density on each panel. Then we have a linear system: Pq = v where P is anNxN matrix of potential coefficients, q is an N-vector of panel charges, v is an N-vector of known panel potentials.
Each entry pij of potential coefficient matrix P represents the potential at panel Ai due to unit charge on panel Aj: Solution q of the linear system Pq = v gives the capacitance.
Challenge • Partition the conductor surfaces into N panels, • Calculate and store the dense NxN matrix P, and • Solve the linear system Pq = v In O(N) time?
N-body Problem • N-body Problem: Given N particles in 3D space, compute all forces between the particles. • Hierarchical Algorithm (Appel 85) • O(N) time (Esselink) • Radiosity (Hanrahan, Salzman & Aupperle) • Multipole Algorithm (Greengard & Rohklin 87) • O(N) time • FastCap
Appel’s Key Ideas • For practical purposes, forces acting on a particle need only be calculated to within the given precision. • The force due to a cluster of particles at some distance can be approximated with a single term.
Outline of New Algorithm • Adaptively partition conductor surfaces into small panels according to a user supplied error bound Pe. • Approximate potential coefficient matrix P and store it in a hierarchical data structure of size O(N). • The data structure permits O(N) time matrix-vector product Px for any N-vector x. • Solve linear system Pq = v using iterative methods.
If the potential coefficient estimate between two panels are greater than Pe, then partition the panels. Otherwise, record the coefficient. C C A E B F G M N L I H J J 1 2 3 4 5 Adaptive Panel Partition
Coefficient Matrix Representation • Entries of P are are stored in a hierarchical data structure as links. A H B C I J D E K L N G M F
A H Matrix with B I J C block entries K E D L D B E A C K I L H J
It can be shown the matrix contains O(N) block entries, where N is the number of panels. If expanded explicitly, the matrix would contain NxN entries. If panel sizes were uniform, the matrix would be much larger than NxN.
Matrix-Vector Product Px • Compute charge for all panels in O(N) time. A H B C I J D E K L N G M F
Compute potential for all panels in O(N) time. A H B C I J D E K L N G M F
Distribute potential to leaf panels in O(N) time. A H B C I J D E K L N G M F
Solving Linear Systems • Use iterative methods such as GMRES or MINRES. • Each iteration requires a matrix-vector product Px and can be completed in O(N) time. • Number of iterations needed is very small, normally 10-20 regardless of N.
Error and Complexity • Error of approximation can be controlled by the user supplied error bound Pe. • Time complexity is O(N) because each of the above steps is O(N).
Experimental Results • Test examples: Bus crossing 2x2, 3x3, …, 6x6. In commercial tools, thousands of these crossings will be computed to build the library. 2x2 Bus crossing
Previous 3D Algorithms • FastCap expansion order 2 (assume accurate). • FastCap expansion order 0. • Pre-corrected FFT. 40% faster than FastCap(2) and uses 1/4 of memory of FastCap(2). • IES3. 60% faster than FastCap(2) and uses 1/5 of memory of FastCap(2).
CPU time (in seconds): 40 - 100 times faster than FastCap(2), 14 - 40 times faster than FastCap(0).
Memory (in MB): 1/60 - 1/100 of memory of FastCap(2), 1/80 - 1/280 of memory of FastCap(0).
Error with respect to FastCap(2): Less than 2.7% error with respect to FastCap(2), 3 times more accurate than FastCap(0).
Conclusion • A new algorithm significantly faster than previous best algorithms. It provides the possibility for 3D extraction of clock trees and critical nets. It can also be used to generate libraries for commercial 2D/2.5D tools. • Kernel independent. Can be applied to multi-layered dielectrics. • Adaptive refinement scheme produces good partition of conductor surfaces. • Hierarchical data structure is much more efficient than previous data structures.
Future Research • Capacitance Extraction • High order basis function • Bottom-up construction of hierarchy • Full chip and critical net extraction • Inductance Extraction • FastHenry is too slow • No commercial tool for mutual inductance. • Variational Parasitic Extraction • MEMS application