300 likes | 408 Views
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction. Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant Mahawar, Texas A&M University Acknowledgements: National Science Foundation. Outline. Inductance Extraction
E N D
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant Mahawar, Texas A&M University Acknowledgements: National Science Foundation.
Outline • Inductance Extraction • Underlying Linear System • The Solenoidal Basis Method • Hierarchical Algorithms • Parallel Formulations • Experimental Results HiPC 2004
Credits: oea.com Inductance Extraction • Inductance • Property of electric circuit to oppose change in its current • Electromotive force (emf) is induced • Self Inductance, Mutual Inductance – between conductors • Inductance extraction • Signal delays in circuits depend on parasitic R, L, C • At high frequency – signal delays dominated by parasitic inductance • Accurate estimation of inductive coupling for circuit components HiPC 2004
Inductance Extraction … • Inductance Extraction • For a set of s conductors – compute s x s impedance matrix Z • Z – self and mutual impedance among conductors • Conductors are discretized using a uniform two dimensional mesh for accurate impedance calculation HiPC 2004
Constraints • Current density at a point • Voltage drop across filaments – filament current & voltage • Kirchoff’s law at nodes • Potential difference in terms of node voltage • Inductance matrix – function of 1/r HiPC 2004
Linear System • System Matrix • Characteristics: • R – diagonal; B – sparse; L – dense • Solution Method • Iterative methods – GMRES • Dense matrix-vector product with L • hierarchical methods, matrix-free approach • Challenge • Effective Preconditioning in absence of system matrix HiPC 2004
Solenoidal Basis Method • Linear system with modified RHS • Solenoidal basis • Automatically satisfies conservation laws - Kirchoff’s current law • Mesh currents - basis for filament current • Solenoidal basis matrix P: • Current obeys Kirchoff’s law: • Reduced system HiPC 2004
Problem Size Number of unknowns for ground plane problem HiPC 2004
Hierarchical Methods • Matrix-vector product with n x n matrix – O (n2) • Faster matrix-vector product • Matrix-free approach • Appel’s algorithm, Barnes-Hut method • Particle-cluster interactions – O (n lg n) • Fast Multipole method • Cluster-cluster interactions – O (n) • Hierarchical refinement of underlying domain • 2-D – quad-tree, 3-D – oct-tree • Rely on decaying 1/r kernel functions • Compute approximate matrix-vector product at the cost of accuracy HiPC 2004
Hierarchical Methods … • Fast Multipole Method (FMM) • Divides the domain recursively into 8 sub-domain • Up-traversal • computes multipole coefficients to give the effects of all the points inside a node at a far-way point • Down-traversal • computes local coefficients to get the effect of all far-away points inside a node • Direct interactions – for near by points • Computation complexity – O ((d+1)4*N) • d – multipole degree HiPC 2004
Hierarchical Methods … • Hierarchical Multipole Method (HMM) • Augmented Barnes-Hut method or variant of FMM • Up-traversal • Same as FMM • For each particle • Multipole-acceptance-criteria (MAC) - ratio of distance of the particle from the center of the box to the dimension of the box • use MAC to determine if multipole coefficients should be used to get the effect of all far-away points or not • Direct interactions – for near by points • Computation complexity – O ((d+1)2*N lg N) HiPC 2004
ParIS: Parallel Solver • Application - inductance extraction • Solve reduced system with preconditioned iterative method • Iterative method – GMRES • Dense matrix-vector product with preconditioner and coefficient matrix • Dense matrix-vector product dominates the computational cost of the algorithm • Use of hierarchical methods to computes potential – inductive effect on filaments • Vector inner products • Negligible computation and communication cost HiPC 2004
OpenMP OpenMP OpenMP Parallelization Scheme • Two tier parallelization • Each conductor - filaments and associated oct-tree • Conductors – across MPI processes • Within a conductor – OpenMP process • Pruning of tree to obtain sub-trees • Computation at top few levels of the tree is sequential HiPC 2004
Experiments • Experiments on Interconnect Cross over problem • 2 cm long, 2mm wide • Distance between conductors • within a layer - .3 mm and across layers - 3 mm • Non-uniform distribution of conductors • Comparison between FMM and HMM • Parallel Platform Beowulf cluster – Texas A&M University • 64bit AMD – Opteron • LAM/MPI on SuSE-Linux – GNU compilers • 1.4 GHz, 128 dual-processor nodes, Gigabit ethernet HiPC 2004
Cross Over Interconnects HiPC 2004
Parameters • d – multipole degree • α – multipole acceptance criteria • s – number of particles per leaf node in tree • Since d and αinfluence accuracy of matrix-vector product • Impedance errors are kept similar – within 1% of a reference value computed by FMM with d = 8 • Scaled Efficiency E = BOPS/p • BOPS = average number of base operations per second • p = number of processors used HiPC 2004
Experimental Results Effect of multipole degree (d) for different choice of s FMM code HMM code HiPC 2004
Experimental Results Effect of multipole degree (d) for different choice of s Time in secs HiPC 2004
Experimental Results Effect of MAC on HMM for different choice of s and d Varying s Varying d HiPC 2004
Experimental Results … Effect of MAC on HMM for different choice of s and d Time in secs Time in secs HiPC 2004
Experimental Results Effect of multipole degree (d) on the HMM code on p processors for two different choice of s s = 8s = 32 HiPC 2004
Experimental Results Effect of multipole degree (d) on the HMM code on p processors for two different choice of s Time in secs HiPC 2004
Experimental Results Effect of multipole degree (d) on the FMM code on p processors for two different choice of s s = 8s = 32 HiPC 2004
Experimental Results Effect of multipole degree (d) on the FMM code on p processors for two different choice of s Time in secs HiPC 2004
Experimental Results … Parallel efficiency of the extraction codes for different choice of d FMM code HMM code HiPC 2004
Experimental Results Parallel efficiency of the extraction codes for different choice of d HiPC 2004
Experimental Results … Ratio of execution time of FMM to HMM code on p processor for different choice of d s = 8 s = 32 HiPC 2004
Experimental Results Ratio of execution time of FMM to HMM code on p processor for different choice of d HiPC 2004
Concluding Remarks • FMM execution time – O ((d+1)4N) • HMM execution time - O ((d+1)2N lg N) • For HMM increase in MAC (α) – increase in time and accuracy for matrix-vector product • FMM achieves higher parallel efficiency for large d • When the number of particles per leaf node (s) is smaller, HMM outperforms FMM in execution time • Parallel implementation, ParIS, is scalable and achieves high parallel efficiency HiPC 2004
Thank You !! HiPC 2004