240 likes | 401 Views
Running M3D on Advanced Computing Architectures. Jin Chen PPPL. M3D Summery. Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator Semi-implicit time step applied 13-19 elliptic solver calls per time step Poisson Equation with Neumann b.c.
E N D
Running M3D on Advanced Computing Architectures Jin Chen PPPL
M3D Summery • Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator • Semi-implicit time step applied • 13-19 elliptic solver calls per time step • Poisson Equation with Neumann b.c. • Matrices symmetrized / solvers optimized • Higher order triangular elements • Runs on NERSC: seaborg / jacquard NLCF : cheetah / ram / phoenix PPPL : fcc / mhd
making M3D matrices symmetry J. Chen, et al., Symmetric Solution in M3D, Computer Physics Communication 164,468(2004).
compiler options • Compile Level 0: make BOPT=O update Level 1: make –f Makefile.fsymm BOPT=O update Level 2: make –f Makefile.fsymm_opt BOPT=O update
Iteration counts comparing (7067->1311->593) gmres/ilu cg/hypre/jacobi restart 30 200 200 fillin level 0 0 3 • Diffpar-operator_4-ibc_1-ID_0 : its = 3 3 2 6 • poisD-operator_3 : its = 404 188 58 8 • Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5 • Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101 • Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101 • poisN-operator_1 : its = 3234 465 89 9 • Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5 • Diffpar-operator_4-ibc_0-ID_0 : its = 3 3 3 6 • poisD-operator_1 : its = 371 181 56 9 • Diffpar-operator_5-ibc_0-ID_0 : its = 2 2 2 4 • poisD-operator_2 : its = 358 182 56 8 • poisN-operator_1 : its = 2593 420 57 9
Strong Scaling in direction m3d timeKSPtime • 1 node (39481 eqs) 2242 sec 1237 sec • 4 nodes( 9871 eqs) 1010 sec 482 sec • 8 nodes( 4971 eqs) 734 sec 353 sec • Note: seaborg. 1 node has 16 processors. The number of equations is counted on each processor. 100 timestep. 16/16/141/1/4/1-4-8. optimz/opt4_scaling_strong_16p/64p/128p/256p.wxo.
Weak Scaling in direction m3d timeKSP time • 1 node (7321 eqs) 129 sec 22 sec • 4 nodes(7261 eqs) 259 sec 42 sec • 8 nodes(7261 eqs) 321 sec 58 sec • 16 nodes(7261 eqs) 322 sec 68 sec Note: seaborg. 10 timestep. Optimz/weaking_scaling_1.16_16_061/121/121/121_4/4/8/16_1/4/8/16_1node/4node/8node/16node.wxo
Most time saved … Neumann b.c.Weakly diagonal dominant matrix 1.Consistent system 2.Unique solution Minimum length solution
Higher order triangles 2nd order 3rd order • Regular higher • J. Chen, et al, • Solving Anisotropic Transport Equation on Misaligned Grids, • LNCS 3516, pp. 1076-1079, 2005. • Lump higher order • G. Cohen, et al, • Higher order trangular finite elements with mass lumping • for the wave equation, Siam J. Numer. Anal. 38(2047-2078), 2001
Run M3D with ho options • Compiler options make BOPT=O update • Runtime options • Regular 2nd order -hoelement -horder 2 \ • Regular 3rd order -hoelement -horder 3 \ • Lump 2nd order -hoelement -lump -horder 2 \ • Lump 3rd order -hoelement -lump -horder 3 \
Benchmark ho code: m3d/code/m2.F • 346 c.. determine mesh • 347 call dmesh • 348 • 349 c.. cjtest 1-dec-04 for linda start -- • 355 call cvolea( one, sum ) • 356 write(0,*)"TEST: cvolea sum for one = ", sum • 357 call cvol( one, sum ) • 358 write(0,*)"TEST: cvol sum for one = ", sum • 359 c.. cjtest 1-dec-04 for linda end -- • 360 • 362 call rnetc • 363 • 364 c... model test problem • 366 call ellip • 367 call circle • 369 c return • 370 • 371 cLS if(impp.eq.1.and.ioldinp.ne.1) call wread_mpp
Benchmark options Compiler options to turn on -DELLIP in m3d/grid/Makefile -DHELMHOLTZ in m3d/interface/Makefile -mhd/driver/test.c: mh3d_test Runtime options Ellip.F controled by elist Circle.F controled by clist
M3D operators • iselect = 11 dudx 1st order partial derivatice • iselect = 12 dudz 1st order partial derivative • iselect = 13 dxdphi toroidal derivative • iselect = 14 cvol total toroidal volume • iselect = 15 cvolea toroidal volume contained in each • iselect = 16 d2udxdz - d2udzdx 2nd order derivative commute • iselect = 17 gradsq vector inner product • iselect = 18 gcro vector cross product • iselect = 19 delsq laplacian and bdy line integral • iselect = 20 div divergence
Numerical Accuracy (2nd order, RMS) operators Linear Regualr HO Lump HO • pure poiss .3133E-04 .1824E-10 .2778E-10 • star poiss .7480E-04 .1741E-07 .9668E-11 • dagg poiss .7689E-05 .1368E-07 .1316E-11 • Helmholtz pure poiss .8375E-04 .5808E-06 .5921E-11 • Helmholtz star poiss .2019E-03 .1122E-04 .1187E-10 • Helmholtz dagg poiss .3648E-04 .1582E-06 .1542E-10 • pure poiss Neumann u_x .3034E-02 .1049E-03 .1157E-03 • u_y .2290E-02 .7860E-04 .8898E-04 • dxdr .2424E-03 .7718E-11 .4413E-13 • dxdz .9665E-03 .1709E-09 .2709E-13 • d2xdrdz - d2xdzdr .9787E-03 .1705E-09 .5611E-11 • grad .4251E-03 .7760E-13 .4639E-13 • gcro .3830E-02 .6218E-09 .9326E-14 • delsq .6927E-03 .9690E-10 .1106E-09
Numerical Efficiency (2nd order) operators Linear Regular HO Lump HO • pure poiss 11.505164 17.993580 15.881487 • star poiss 11.936641 17.842965 15.577935 • dagg poiss 11.487363 17.065694 15.590550 • Helmholtz pure poiss 11.593001 17.850698 15.764700 • Helmholtz star poiss 11.827986 17.617935 15.462633 • Helmholtz dagg poiss 11.127486 17.504207 15.329060 • pure poiss Neumann 11.800331 17.994744 15.368874 • dxdr 0.325041 2.822974 0.443981 • dxdz 0.467021 2.539099 0.419528 • d2xdrdz - d2xdzdr 0.560459 9.457784 2.098601 • grad 0.680051 2.715444 0.961536 • gcro 0.234130 2.418649 0.544330 • Delsq(Laplacian) 0.355726 6.733015 0.554883
poisson solver scales to # of eqs Lump HO is used. Gmres/ilu.
Application of ho code to anisotropic transport on misaligned grids
M3D on X1 • m3dp.x m3dp_vec.x • m3dp_fsymm.x m3dp_fsymm_vec.x ?? • m3dp_fsymm_opt.x m3dp_fsymm_opt_vec.x ??
Optimizing M3D on X1—cont’d • Petsc, Matrix Vector Product • MatMult flops (16MSP): Standard petsc Optimized petsc 6.81 MFlops 54.0 MFlops