140 likes | 145 Views
This study focuses on optimizing M3D calculations for petascale computers, including strong and weak scaling tests and strategies for improving data copy efficiency.
E N D
M3D Scaling studies Jin Chen M3D Group M3D meeting, Sept 08, 2006 M3D meeting, Sept 29, 2006
Motivation To prepare for petascale calculations: Total M3D Time = 726.879607 M3d-Par 13392 2.0653e+02 KSPSolve 241 2.7470e+02 • KSPSolve to solve linear systems arising from finite-element descritization of elliptic equations • m3d2par and Par2m3d to copy data back and forth between fortran data datadistribution and petsc data layout Their efficiencies are critical for optimization on petascale computers.
Outline • 3D (r,θ,φ) strong scaling • 3D (r,θ,φ) weak scaling • 1D (φ) weak scaling • 2D (r,θ) weak scaling • Solutions
3D (r,θ,φ) strong scaling nstp=20. Configuration Parameters: A 32 32 32 B 8 8 8 C 436 436 436 D 2 2 2 E 5 5 5 F 11 17 23
3D (r,θ,φ) weak scaling nstp=20. Configuration Parameters: A 16 32 64 B 4 8 16 C 283 356 398 D 1 2 2 E 4 5 6 F 4 7 11
1D (φ) weak scaling nstp=100. Configuration Parameters: Changing A B 16 4 32 8 64 16 128 32 256 64 512 128 1024 256 2048 512 Fixing: C=201,D=1,E=4,F=4
2D (r,θ) weak scaling nstp=100. Configuration Parameters: Fixing: A=16, B=4 Changing C D E F 283 1 4 4 356 2 5 7 398 2 6 11 427 2 7 15 446 2 8 19 459 2 9 23 469 2 10 27 479 2 11 31 487 2 12 35 496 2 13 39 501 2 14 43 505 2 15 47
Strategy to improve M3D-PAR data copy • Reduce toroidal ghost changes (m3d2par, m3d2part) • Different poloidal partition to reduce poloidal ghost changes: 2 times faster on seaborg
Problems fix on Jaguar • Runtime memory limitation • Solution: use only 1 processor per node yod –SN m3dp_fsymm_opt.x … • Code crashes when the number of processor increases from 2048 to 3076 or 4096: module load gmalloc link –gmalloc as the last library to build m3dp.x • Wait too long when debugging code • We need dedicated time to fix bugs only appeared on large number of processors. • Fortran static array (stack) yod –SN –stack 500M m3dp_fsymm_opt.x …
BGL • I got an account.