170 likes | 304 Views
On the Performance of PC Clusters in Solving Partial Differential Equations. Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway. Outline of the talk. Introduction Beowulf clusters – cost effective approach to solving PDEs Performance analysis of a Linux cluster
E N D
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway
Outline of the talk • Introduction • Beowulf clusters – cost effective approach to solving PDEs • Performance analysis of a Linux cluster • Numerical experiments & measurements
A generic finite element PDE solver • Time stepping t0, t1, t2… • Spatial discretization on computational grid • Solution of nonlinear problems • Solution of linearized problems • Iterative solution of Ax=b
An observation • The computation-intensive part is the iterative solution ofAx=b • A parallel finite element PDE solver needs to run the linear algebra kernels in parallel • vector addition • inner-product of two vectors • matrix-vector product • Two types of inter-processor communication • Ratio computation/communication is high • Relatively tolerant of slow communication
A natural parallelization of PDE solvers • The global solution domain is partitioned into many smaller sub-domains • One sub-domain works as a ”unit”, with its sub-matrices and sub-vectors • No need to create global matrices and vectors physically • The global linear algebra operations can be realized by local operations + inter-processor communication
Linear-algebra level parallelization • A SPMD model • Reuse of existing code for local linear algebra operations • Need new code for the parallelization specific tasks • grid partition (non-overlapping, overlapping) • inter-processor communication routines
Object orientation • An add-on ”toolbox” containing all the parallelization specific codes • The ”toolbox” has many high-level routines, hides the low-level MPI details • The existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communications • A seamless coupling between the huge sequential libraries and the add-on toolbox
Diffpack • O-O software environment for scientific computation (C++) • Rich collection of PDE solution components - portable, flexible, extensible • http://www.nobjects.com • H.P.Langtangen, Computational Partial Differential Equations, Springer 1999
Straightforward parallelization • Develop a sequential simulator, without paying attention to parallelism • Follow the Diffpack coding standards • Use the add-on toolbox for parallel computing • Add a few new statements for transformation to a parallel simulator
A Linux cluster • 48 Pentium-III 500MHz procs (24 nodes) • 512 MB memory per node • One 3com905B network card per node • Fast ethernet 100 Mbit/s • 26-port Cisco Catalyst 2926 switch • Price: around $60,000
Parallel simulation of 3D acoustic field 3D nonlinear model
3D nonlinear acoustic field simulation Comparison between Origin 2000 and Linux cluster 1,030,301 grid points
Impressible Navier-Stokes • Numerical strategy: operator splitting • Calculation of an intermediate velocity in a predictor-corrector way • Solution of a Poisson equation • Correction of the intermediate velocity
Impressible Navier-Stokes Explicit schemes for predicting and correcting the velocity Implicit solution of the pressure by CG
3D nonlinear water waves • Fully nonlinear 3D water waves • Primary unknowns:
3D nonlinear water waves • Global 3D grid: 49x49x41 • Global solver: CG + overlapping Schwarz prec. • Multigrid V-cycle as subdomain solver • CPU measurement of a total of 32 time steps • Parallel simulation on the Linux cluster
Summary • OOP+MPI give portable parallel software • Beowulf clusters suit well for solving PDEs • Applicable to a wide range of PDEs • Performance: satisfactory speed-up • Issues need to be considered for further improvement