1 / 17

On the Performance of PC Clusters in Solving Partial Differential Equations

On the Performance of PC Clusters in Solving Partial Differential Equations. Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway. Outline of the talk. Introduction Beowulf clusters – cost effective approach to solving PDEs Performance analysis of a Linux cluster

Download Presentation

On the Performance of PC Clusters in Solving Partial Differential Equations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway

  2. Outline of the talk • Introduction • Beowulf clusters – cost effective approach to solving PDEs • Performance analysis of a Linux cluster • Numerical experiments & measurements

  3. A generic finite element PDE solver • Time stepping t0, t1, t2… • Spatial discretization on computational grid • Solution of nonlinear problems • Solution of linearized problems • Iterative solution of Ax=b

  4. An observation • The computation-intensive part is the iterative solution ofAx=b • A parallel finite element PDE solver needs to run the linear algebra kernels in parallel • vector addition • inner-product of two vectors • matrix-vector product • Two types of inter-processor communication • Ratio computation/communication is high • Relatively tolerant of slow communication

  5. A natural parallelization of PDE solvers • The global solution domain is partitioned into many smaller sub-domains • One sub-domain works as a ”unit”, with its sub-matrices and sub-vectors • No need to create global matrices and vectors physically • The global linear algebra operations can be realized by local operations + inter-processor communication

  6. Linear-algebra level parallelization • A SPMD model • Reuse of existing code for local linear algebra operations • Need new code for the parallelization specific tasks • grid partition (non-overlapping, overlapping) • inter-processor communication routines

  7. Object orientation • An add-on ”toolbox” containing all the parallelization specific codes • The ”toolbox” has many high-level routines, hides the low-level MPI details • The existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communications • A seamless coupling between the huge sequential libraries and the add-on toolbox

  8. Diffpack • O-O software environment for scientific computation (C++) • Rich collection of PDE solution components - portable, flexible, extensible • http://www.nobjects.com • H.P.Langtangen, Computational Partial Differential Equations, Springer 1999

  9. Straightforward parallelization • Develop a sequential simulator, without paying attention to parallelism • Follow the Diffpack coding standards • Use the add-on toolbox for parallel computing • Add a few new statements for transformation to a parallel simulator

  10. A Linux cluster • 48 Pentium-III 500MHz procs (24 nodes) • 512 MB memory per node • One 3com905B network card per node • Fast ethernet 100 Mbit/s • 26-port Cisco Catalyst 2926 switch • Price: around $60,000

  11. Parallel simulation of 3D acoustic field 3D nonlinear model

  12. 3D nonlinear acoustic field simulation Comparison between Origin 2000 and Linux cluster 1,030,301 grid points

  13. Impressible Navier-Stokes • Numerical strategy: operator splitting • Calculation of an intermediate velocity in a predictor-corrector way • Solution of a Poisson equation • Correction of the intermediate velocity

  14. Impressible Navier-Stokes Explicit schemes for predicting and correcting the velocity Implicit solution of the pressure by CG

  15. 3D nonlinear water waves • Fully nonlinear 3D water waves • Primary unknowns:

  16. 3D nonlinear water waves • Global 3D grid: 49x49x41 • Global solver: CG + overlapping Schwarz prec. • Multigrid V-cycle as subdomain solver • CPU measurement of a total of 32 time steps • Parallel simulation on the Linux cluster

  17. Summary • OOP+MPI give portable parallel software • Beowulf clusters suit well for solving PDEs • Applicable to a wide range of PDEs • Performance: satisfactory speed-up • Issues need to be considered for further improvement

More Related