200 likes | 382 Views
SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS. Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard (*) , Jacques Massoni (***), Vanessa Mariotti(*), Youssef Mesri(*), Patrick Nivet (*), Steve Wornom(*). Large scale computations and CFD.
E N D
SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques Massoni (***), Vanessa Mariotti(*), Youssef Mesri(*), Patrick Nivet (*), Steve Wornom(*)
Large scale computations and CFD Turbulent flows, Required number of mesh points : N = Re ^9/4 Laboratory experiment : Re = 82 000 Industrial devices : Re = 1000 000 Geophysical flows : Re = 10 000 0000
Future of large scale computations in CFD 2000 2005 2010 1 M mesh 10 M mesh 100 M mesh 1 Tflops 10 Tflops 100 Tflops What kind of architecture for these computations ? Super clusters, e.g Tera10 machine of DAM CEA 4532 proc Intel Titanium Grid architecture ?
End-users requirements Transparent solution : The grid must be view as a single unified ressource by the end-users No important code modifications : codes using Fortran/MPI And C/C++/MPI must run on the grid Secure
Mecagrid :project Started 11/2002 Connect 3 sites in The PACA region Perform experiments In grid computing Applied to multimaterial Fluid dynamics
Set-up of the Grid Marseille and CEMEF clusters are private IP address Only front-end are routable through the internet Solution : create a VPN, front end are connected by a tunnel where packets are crypted and transmitted Installation of the Globus middleware () Message passing : MPICH-G2
CEMEF Sophia N=32, bi-proc Sp=2.4Ghz Vpq=100Mb/s INRIA Sophia pf N=16, bi-proc Sp=933Mhz Vpq=100Mb/s 10Mb/s 100Mb/s 10Mb/s IUSTI Marseille nina N=32, mono-proc Sp=2.4Ghz Vpq=100Mb/s N=19, bi-proc Sp=2.4Ghz Vpq=1Gb/s The MecaGrid : heterogeneous architecture of 162 procs
CEMEF Sophia N=32, bi-proc Sp=2.4Ghz Vpq=100Mb/s INRIA Sophia pf N=16, bi-proc Sp=933Mhz Vpq=100Mb/s 100Mb/s 3.7Mb/s IUSTI Marseille nina N=32, mono-proc Sp=2.4Ghz Vpq=100Mb/s N=16, bi-proc Sp=2.4Ghz Vpq=1Gb/s The Mecagrid : mesured performances Stability of the External network 7.2Mb/s 5Mb/s
CFD and parallelismSPMD model Initial mesh Mesh Partitioning Sub-domain 1 Sub-domain 2 Sub-domain 3 Solver Solver Solver Message passing Messagepassing Data Data Data solution
CODE PORTING AERO-3D Finite volume code using Fortran77/MPI 3D Compressible Navier-Stokes equations with Turbulence modeling (50 000 instructions) Rewrite the code in fortran 90 AEDIPH Finite volume code designed for multimaterial Studies CIMlib library of CEMEF : a C++/MPI finite element library Solving multimaterial incompressible flows
Test case : Jet in cross flow 3D LES Turbulence Modeling, Compressible Flow, explicit solver Results for 32 partitions 100 time steps Sophia clusters Sophia1-Marseille Sophia2-Marseille 241K mesh 729 s 817 s 1181 Com/work 9% 69% 46% 400K mesh 827 s 729 965 Com/work 1% 13% 6%
Test case 2: 3D Dam break pb 3-D Incompressible Navier-Stokes computation, Level-set representation of the interface with Hamilton-Jacobi reinitialization, Iterative implicit scheme using GMRES (MINRES) preconditioned with ILU, 600 time steps
3D DAM BREAK RESULTS 500 K mesh, 2.5M elements 600 time steps : Implicit code : 600 2Mx2M linear systems solved Results on 3 x 4 proc on 3 different clusters : 60 h With optimisation of the code for the grid : 37 h 1.5 M mesh, 8.7 M elements 600 time steps : Implicit code : 600 6Mx6M linear systems solved Results on 3 x 11 proc on 3 different clusters : 125 h
PROVISIONAL CONCLUSIONS : Mecagrid gives access to a large number of processors and the possibility to run larger applications than on a in-home cluster For sufficient large applications : compete with an in home cluster No significant communications overhead for sufficient large applications HOWEVER Fine tuning of the application codes to obtain good efficiency Algorithmic developments
Heterogeneous Mesh partitioning The mapping problem : find the mesh partition that minimise the CPU time Homogeneous (cluster architecture) : load balancing Heterogeneous (Grid):
Algorithmic Developpements Iterative linear solvers : b = AX A sparse X X + P (b-AX) P : Preconditioning matrix LU factorization of A : A = LU P : ILU (0), ILU(1), …ILU(k) ILU(0) ILU(1) ILU(2) ILU(3) Normalized # iter 100 60 46 38 CPU cluster 100 97 126 205 CPU Mecagrid 100 60 65 87
partitioner partitioner partitioner partitioner Hierarchical mesh partitioner Initial mesh
CPU Time 143.66 180.8 349.42 579.505 Heterogeneous Mesh partitioning : Test case on 32 proc, mesh size 400 K clusters Sophia-MRS(hetero) Sophia1-Sophia2(hetero) Sophia1-Sophia2(homo) Sophia-MRS(homo) Gain of more than 75% !
Conclusions Grid appears as a viable alternative to the use of specialized super-clusters for large scale CFD computations From the point of view of the numerical analysis, grid architectures are a source of new questions : Mesh and graph partitioning Linear solvers Communication and latency hiding schemes ….