240 likes | 256 Views
Explore M3D code structure, matrices symmetry, solver optimizations, and performance metrics. Learn about elliptic solvers, Poisson equation solving, and runtime options. Discover improved time components and scaling techniques in running M3D on various computational platforms.
E N D
Running M3D on Advanced Computing Architectures Jin Chen PPPL
M3D Summery • Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator • Semi-implicit time step applied • 13-19 elliptic solver calls per time step • Poisson Equation with Neumann b.c. • Matrices symmetrized / solvers optimized • Higher order triangular elements • Runs on NERSC: seaborg / jacquard NLCF : cheetah / ram / phoenix PPPL : fcc / mhd
making M3D matrices symmetry J. Chen, et al., Symmetric Solution in M3D, Computer Physics Communication 164,468(2004).
compiler options • Compile Level 0: make BOPT=O update Level 1: make –f Makefile.fsymm BOPT=O update Level 2: make –f Makefile.fsymm_opt BOPT=O update
Iteration counts comparing (7067->1311->593) gmres/ilu cg/hypre/jacobi restart 30 200 200 fillin level 0 0 3 • Diffpar-operator_4-ibc_1-ID_0 : its = 3 3 2 6 • poisD-operator_3 : its = 404 188 58 8 • Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5 • Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101 • Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101 • poisN-operator_1 : its = 3234 465 89 9 • Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5 • Diffpar-operator_4-ibc_0-ID_0 : its = 3 3 3 6 • poisD-operator_1 : its = 371 181 56 9 • Diffpar-operator_5-ibc_0-ID_0 : its = 2 2 2 4 • poisD-operator_2 : its = 358 182 56 8 • poisN-operator_1 : its = 2593 420 57 9
Strong Scaling in direction m3d timeKSPtime • 1 node (39481 eqs) 2242 sec 1237 sec • 4 nodes( 9871 eqs) 1010 sec 482 sec • 8 nodes( 4971 eqs) 734 sec 353 sec • Note: seaborg. 1 node has 16 processors. The number of equations is counted on each processor. 100 timestep. 16/16/141/1/4/1-4-8. optimz/opt4_scaling_strong_16p/64p/128p/256p.wxo.
Weak Scaling in direction m3d timeKSP time • 1 node (7321 eqs) 129 sec 22 sec • 4 nodes(7261 eqs) 259 sec 42 sec • 8 nodes(7261 eqs) 321 sec 58 sec • 16 nodes(7261 eqs) 322 sec 68 sec Note: seaborg. 10 timestep. Optimz/weaking_scaling_1.16_16_061/121/121/121_4/4/8/16_1/4/8/16_1node/4node/8node/16node.wxo
Most time saved … Neumann b.c.Weakly diagonal dominant matrix 1.Consistent system 2.Unique solution Minimum length solution
Higher order triangles 2nd order 3rd order • Regular higher • J. Chen, et al, • Solving Anisotropic Transport Equation on Misaligned Grids, • LNCS 3516, pp. 1076-1079, 2005. • Lump higher order • G. Cohen, et al, • Higher order trangular finite elements with mass lumping • for the wave equation, Siam J. Numer. Anal. 38(2047-2078), 2001
Run M3D with ho options • Compiler options make BOPT=O update • Runtime options • Regular 2nd order -hoelement -horder 2 \ • Regular 3rd order -hoelement -horder 3 \ • Lump 2nd order -hoelement -lump -horder 2 \ • Lump 3rd order -hoelement -lump -horder 3 \
Benchmark ho code: m3d/code/m2.F • 346 c.. determine mesh • 347 call dmesh • 348 • 349 c.. cjtest 1-dec-04 for linda start -- • 355 call cvolea( one, sum ) • 356 write(0,*)"TEST: cvolea sum for one = ", sum • 357 call cvol( one, sum ) • 358 write(0,*)"TEST: cvol sum for one = ", sum • 359 c.. cjtest 1-dec-04 for linda end -- • 360 • 362 call rnetc • 363 • 364 c... model test problem • 366 call ellip • 367 call circle • 369 c return • 370 • 371 cLS if(impp.eq.1.and.ioldinp.ne.1) call wread_mpp
Benchmark options Compiler options to turn on -DELLIP in m3d/grid/Makefile -DHELMHOLTZ in m3d/interface/Makefile -mhd/driver/test.c: mh3d_test Runtime options Ellip.F controled by elist Circle.F controled by clist
M3D operators • iselect = 11 dudx 1st order partial derivatice • iselect = 12 dudz 1st order partial derivative • iselect = 13 dxdphi toroidal derivative • iselect = 14 cvol total toroidal volume • iselect = 15 cvolea toroidal volume contained in each • iselect = 16 d2udxdz - d2udzdx 2nd order derivative commute • iselect = 17 gradsq vector inner product • iselect = 18 gcro vector cross product • iselect = 19 delsq laplacian and bdy line integral • iselect = 20 div divergence
Numerical Accuracy (2nd order, RMS) operators Linear Regualr HO Lump HO • pure poiss .3133E-04 .1824E-10 .2778E-10 • star poiss .7480E-04 .1741E-07 .9668E-11 • dagg poiss .7689E-05 .1368E-07 .1316E-11 • Helmholtz pure poiss .8375E-04 .5808E-06 .5921E-11 • Helmholtz star poiss .2019E-03 .1122E-04 .1187E-10 • Helmholtz dagg poiss .3648E-04 .1582E-06 .1542E-10 • pure poiss Neumann u_x .3034E-02 .1049E-03 .1157E-03 • u_y .2290E-02 .7860E-04 .8898E-04 • dxdr .2424E-03 .7718E-11 .4413E-13 • dxdz .9665E-03 .1709E-09 .2709E-13 • d2xdrdz - d2xdzdr .9787E-03 .1705E-09 .5611E-11 • grad .4251E-03 .7760E-13 .4639E-13 • gcro .3830E-02 .6218E-09 .9326E-14 • delsq .6927E-03 .9690E-10 .1106E-09
Numerical Efficiency (2nd order) operators Linear Regular HO Lump HO • pure poiss 11.505164 17.993580 15.881487 • star poiss 11.936641 17.842965 15.577935 • dagg poiss 11.487363 17.065694 15.590550 • Helmholtz pure poiss 11.593001 17.850698 15.764700 • Helmholtz star poiss 11.827986 17.617935 15.462633 • Helmholtz dagg poiss 11.127486 17.504207 15.329060 • pure poiss Neumann 11.800331 17.994744 15.368874 • dxdr 0.325041 2.822974 0.443981 • dxdz 0.467021 2.539099 0.419528 • d2xdrdz - d2xdzdr 0.560459 9.457784 2.098601 • grad 0.680051 2.715444 0.961536 • gcro 0.234130 2.418649 0.544330 • Delsq(Laplacian) 0.355726 6.733015 0.554883
poisson solver scales to # of eqs Lump HO is used. Gmres/ilu.
Application of ho code to anisotropic transport on misaligned grids
M3D on X1 • m3dp.x m3dp_vec.x • m3dp_fsymm.x m3dp_fsymm_vec.x ?? • m3dp_fsymm_opt.x m3dp_fsymm_opt_vec.x ??
Optimizing M3D on X1—cont’d • Petsc, Matrix Vector Product • MatMult flops (16MSP): Standard petsc Optimized petsc 6.81 MFlops 54.0 MFlops