1 / 24

Running M3D on Advanced Computing Architectures

Running M3D on Advanced Computing Architectures. Jin Chen PPPL. M3D Summery. Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator Semi-implicit time step applied 13-19 elliptic solver calls per time step Poisson Equation with Neumann b.c.

rod
Download Presentation

Running M3D on Advanced Computing Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running M3D on Advanced Computing Architectures Jin Chen PPPL

  2. M3D Summery • Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator • Semi-implicit time step applied • 13-19 elliptic solver calls per time step • Poisson Equation with Neumann b.c. • Matrices symmetrized / solvers optimized • Higher order triangular elements • Runs on NERSC: seaborg / jacquard NLCF : cheetah / ram / phoenix PPPL : fcc / mhd

  3. M3D code structure

  4. making M3D matrices symmetry J. Chen, et al., Symmetric Solution in M3D, Computer Physics Communication 164,468(2004).

  5. compiler options • Compile Level 0: make BOPT=O update Level 1: make –f Makefile.fsymm BOPT=O update Level 2: make –f Makefile.fsymm_opt BOPT=O update

  6. runtime options

  7. Iteration counts comparing (7067->1311->593) gmres/ilu cg/hypre/jacobi restart 30 200 200 fillin level 0 0 3 • Diffpar-operator_4-ibc_1-ID_0 : its = 3 3 2 6 • poisD-operator_3 : its = 404 188 58 8 • Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5 • Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101 • Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101 • poisN-operator_1 : its = 3234 465 89 9 • Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5 • Diffpar-operator_4-ibc_0-ID_0 : its = 3 3 3 6 • poisD-operator_1 : its = 371 181 56 9 • Diffpar-operator_5-ibc_0-ID_0 : its = 2 2 2 4 • poisD-operator_2 : its = 358 182 56 8 • poisN-operator_1 : its = 2593 420 57 9

  8. Improved M3D time components

  9. Strong Scaling in direction m3d timeKSPtime • 1 node (39481 eqs)    2242 sec 1237 sec • 4 nodes( 9871 eqs)    1010 sec 482 sec • 8 nodes( 4971 eqs) 734 sec 353 sec • Note: seaborg. 1 node has 16 processors. The number of equations is counted on each processor. 100 timestep. 16/16/141/1/4/1-4-8. optimz/opt4_scaling_strong_16p/64p/128p/256p.wxo.

  10. Weak Scaling in direction m3d timeKSP time • 1 node (7321 eqs)  129 sec 22 sec • 4 nodes(7261 eqs)  259 sec 42 sec • 8 nodes(7261 eqs)  321 sec 58 sec • 16 nodes(7261 eqs) 322 sec 68 sec Note: seaborg. 10 timestep. Optimz/weaking_scaling_1.16_16_061/121/121/121_4/4/8/16_1/4/8/16_1node/4node/8node/16node.wxo

  11. Most time saved … Neumann b.c.Weakly diagonal dominant matrix 1.Consistent system 2.Unique solution Minimum length solution

  12. Higher order triangles 2nd order 3rd order • Regular higher • J. Chen, et al, • Solving Anisotropic Transport Equation on Misaligned Grids, • LNCS 3516, pp. 1076-1079, 2005. • Lump higher order • G. Cohen, et al, • Higher order trangular finite elements with mass lumping • for the wave equation, Siam J. Numer. Anal. 38(2047-2078), 2001

  13. 2nd order meshes with p__from Linda

  14. Run M3D with ho options • Compiler options make BOPT=O update • Runtime options • Regular 2nd order -hoelement -horder 2 \ • Regular 3rd order -hoelement -horder 3 \ • Lump 2nd order -hoelement -lump -horder 2 \ • Lump 3rd order -hoelement -lump -horder 3 \

  15. Benchmark ho code: m3d/code/m2.F • 346 c.. determine mesh • 347 call dmesh • 348 • 349 c.. cjtest 1-dec-04 for linda start -- • 355 call cvolea( one, sum ) • 356 write(0,*)"TEST: cvolea sum for one = ", sum • 357 call cvol( one, sum ) • 358 write(0,*)"TEST: cvol sum for one = ", sum • 359 c.. cjtest 1-dec-04 for linda end -- • 360 • 362 call rnetc • 363 • 364 c... model test problem • 366 call ellip • 367 call circle • 369 c return • 370 • 371 cLS if(impp.eq.1.and.ioldinp.ne.1) call wread_mpp

  16. Benchmark options Compiler options to turn on -DELLIP in m3d/grid/Makefile -DHELMHOLTZ in m3d/interface/Makefile -mhd/driver/test.c: mh3d_test Runtime options Ellip.F controled by elist Circle.F controled by clist

  17. M3D elliptic solvers

  18. M3D operators • iselect = 11 dudx 1st order partial derivatice • iselect = 12 dudz 1st order partial derivative • iselect = 13 dxdphi toroidal derivative • iselect = 14 cvol total toroidal volume • iselect = 15 cvolea toroidal volume contained in each • iselect = 16 d2udxdz - d2udzdx 2nd order derivative commute • iselect = 17 gradsq vector inner product • iselect = 18 gcro vector cross product • iselect = 19 delsq laplacian and bdy line integral • iselect = 20 div divergence

  19. Numerical Accuracy (2nd order, RMS) operators Linear Regualr HO Lump HO • pure poiss .3133E-04 .1824E-10 .2778E-10 • star poiss .7480E-04 .1741E-07 .9668E-11 • dagg poiss .7689E-05 .1368E-07 .1316E-11 • Helmholtz pure poiss .8375E-04 .5808E-06 .5921E-11 • Helmholtz star poiss .2019E-03 .1122E-04 .1187E-10 • Helmholtz dagg poiss .3648E-04 .1582E-06 .1542E-10 • pure poiss Neumann u_x .3034E-02 .1049E-03 .1157E-03 • u_y .2290E-02 .7860E-04 .8898E-04 • dxdr .2424E-03 .7718E-11 .4413E-13 • dxdz .9665E-03 .1709E-09 .2709E-13 • d2xdrdz - d2xdzdr .9787E-03 .1705E-09 .5611E-11 • grad .4251E-03 .7760E-13 .4639E-13 • gcro .3830E-02 .6218E-09 .9326E-14 • delsq .6927E-03 .9690E-10 .1106E-09

  20. Numerical Efficiency (2nd order) operators Linear Regular HO Lump HO • pure poiss 11.505164 17.993580 15.881487 • star poiss 11.936641 17.842965 15.577935 • dagg poiss 11.487363 17.065694 15.590550 • Helmholtz pure poiss 11.593001 17.850698 15.764700 • Helmholtz star poiss 11.827986 17.617935 15.462633 • Helmholtz dagg poiss 11.127486 17.504207 15.329060 • pure poiss Neumann 11.800331 17.994744 15.368874 • dxdr 0.325041 2.822974 0.443981 • dxdz 0.467021 2.539099 0.419528 • d2xdrdz - d2xdzdr 0.560459 9.457784 2.098601 • grad 0.680051 2.715444 0.961536 • gcro 0.234130 2.418649 0.544330 • Delsq(Laplacian) 0.355726 6.733015 0.554883

  21. poisson solver scales to # of eqs Lump HO is used. Gmres/ilu.

  22. Application of ho code to anisotropic transport on misaligned grids

  23. M3D on X1 • m3dp.x m3dp_vec.x • m3dp_fsymm.x m3dp_fsymm_vec.x ?? • m3dp_fsymm_opt.x m3dp_fsymm_opt_vec.x ??

  24. Optimizing M3D on X1—cont’d • Petsc, Matrix Vector Product • MatMult flops (16MSP): Standard petsc Optimized petsc 6.81 MFlops 54.0 MFlops

More Related