230 likes | 415 Views
Porting Chemistry Applications to Abe: Lessons Learned. Dodi Heryadi Advanced Application Support Group. Outline . A very brief overview of Abe Chemistry Applications on Abe Porting an OpenMP code Porting an MPI code Debugging on abe. A very brief overview of a Multi-core system.
E N D
Porting Chemistry Applications to Abe: Lessons Learned Dodi Heryadi Advanced Application Support Group
Outline • A very brief overview of Abe • Chemistry Applications on Abe • Porting an OpenMP code • Porting an MPI code • Debugging on abe Imaginations unbound
A very brief overview of a Multi-core system a simplified diagram of a system with two CPU sockets, main memory, and PCI expansion boards such as a mpi network card for infiniband or myrinet, an ethernet card, or a graphics card. An example is tungsten.ncsa.uiuc.edu the core layout with L2 cache (shown in blue) for the two CPU sockets containing quad-core Intel Xeon processors on Abe at NCSA [abe.ncsa.uiuc.edu]. Imaginations unbound
Comparing Abe and Tungsten • Abe: 8 cores per node • Tungsten: 2 cores per node #SUs for abe = 8 * # Nodes * Wall_Time #SUs for tungsten = 2 * # Nodes * Wall_Time • For the same #Nodes and Wall_Time, jobs running on abe will be charged four times as much as those running on tungsten • applications running on abe should ideally be at least four times faster compared to those running on tungsten Imaginations unbound
Chemistry Applications on Abe: Available and Planned Quantum Chemistry Gaussian (OpenMP) Gamess (MPI) NWChem (Global Array with MPI) Molpro (Global Array and OpenMP) Classical Molecular Dynamics Amber (MPI) Gromacs (MPI) CHARMM (MPI) NAMD (CHARMM++ with MPI) Ab-initio Molecular Dynamics CPMD (MPI) VASP (MPI) Wien2k (MPI) Imaginations unbound
Porting an OpenMP Code: Gaussian • Perhaps the most widely used Computational Chemistry package in the world • Well known for consuming most of available computing resources in Supercomputer Centers • Migration of Gaussian users from tungsten since its retirement Imaginations unbound
Very Brief Overview of Gaussian Code • Developed since 1970s (over 1 million lines of code, mostly in Fortran with some C) • Memory is allocated in a big chunk (through malloc) Imaginations unbound
Older version (Gaussian 98): DMP: Linda SMP: fork, shmget New Version (Gaussian 03) Linda OpenMP hybrid Parallelization of Gaussian Imaginations unbound
Porting Gaussian 03 on Abe • Support PGI Compilers for EM64T • Used the makefile for IA64 (with some modifications) Imaginations unbound
Initial Gaussian 03 Benchmarks (Valinomycin Force Calculations): Wall time (seconds) Imaginations unbound
Initial Gaussian 03 Benchmarks (Valinomycin Force Calculations): Speed-Up Imaginations unbound
Improving Gaussian 03 Performance with Cache Blocking • Reordering memory accesses to increase temporal locality • Used block size of 2 MB (the size of L2 cache per core) Imaginations unbound
Gaussian 03 Benchmarks on abe: Before and After Cache Blocking Imaginations unbound
Porting an MPI code: Amber • a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs) • a package of molecular simulation programs which includes source code and demos. (http://amber.scripps.edu/) Imaginations unbound
Porting Amber to abe • One of the first few applications ported to abe • Tested with three different MPI implementations: VMI, MVAPICH, and OpenMPI • Performances on VMI and MVAPICH were comparable • Performance on OpenMPI was the worst Imaginations unbound
Amber Benchmarks: cellulose fiber solvated in TIP3P water in a periodic box (408 K atoms) wall time (in seconds) Imaginations unbound
Debugging on abe with gdbwhere.pl and ssh_pbs.pl commands (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/CommonDoc/gdbwhere.html) “ …The gdbwhere.pl command will run a gdb backtrace [(gdb) where ] for the running processes on a machine [state R from the ps command]…” Imaginations unbound
Debugging on abecase: the job is running, but no output is written • Check the job status [dodi@honest2 lev]$ qstat -u dodi abem5.ncsa.uiuc.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 456689.abem5.ncsa.ui dodi normal nwchem.234 -- 5 1 -- 02:00 R 00:12 Imaginations unbound
Debugging on abe: 2. find compute node(s) where the job is running on [dodi@honest2 lev]$ qstat -f 456689 Job Id: 456689.abem5.ncsa.uiuc.edu Job_Name = nwchem.23415 Job_Owner = dodi@abe1197 job_state = R queue = normal Error_Path = honest1.ncsa.uiuc.edu:/u/ncsa/dodi/scratch-global/lev/lithium _xtal_2x2x2.err exec_host = abe0236/7+abe0236/6+abe0236/5+abe0236/4+abe0236/3+abe0236/2+ab e0236/1+abe0236/0+abe0228/7+abe0228/6+abe0228/5+abe0228/4+abe0228/3+ab e0228/2+abe0228/1+abe0228/0+abe0191/7+abe0191/6+abe0191/5+abe0191/4+ab e0191/3+abe0191/2+abe0191/1+abe0191/0+abe0180/7+abe0180/6+abe0180/5+ab e0180/4+abe0180/3+abe0180/2+abe0180/1+abe0180/0+abe0125/7+abe0125/6+ab e0125/5+abe0125/4+abe0125/3+abe0125/2+abe0125/1+abe0125/0 Imaginations unbound
Debugging on abe: 3. ssh to one of the compute nodes and type top [dodi@honest2 lev]$ ssh abe0236 [dodi@abe0236 ~]$ top top - 10:56:38 up 35 days, 11:30, 2 users, load average: 7.99, 6.47, 4.74 Tasks: 346 total, 10 running, 336 sleeping, 0 stopped, 0 zombie Cpu(s): 99.6% us, 0.4% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 16269968k total, 5745372k used, 10524596k free, 6292k buffers Swap: 8393952k total, 736k used, 8393216k free, 4331528k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8610 dodi 25 0 1714m 89m 5004 R 100 0.6 6:49.59 nwchem 8611 dodi 25 0 1714m 89m 4960 R 100 0.6 6:49.30 nwchem 8613 dodi 25 0 1714m 89m 4908 R 100 0.6 6:49.54 nwchem 8615 dodi 25 0 1714m 89m 4832 R 100 0.6 6:48.69 nwchem 8614 dodi 25 0 1714m 89m 4952 R 100 0.6 6:49.84 nwchem 8617 dodi 25 0 1895m 538m 301m R 100 3.4 6:49.44 nwchem 8612 dodi 25 0 1714m 88m 5016 R 100 0.6 6:48.98 nwchem 8616 dodi 25 0 1714m 88m 4908 R 99 0.6 6:48.88 nwchem 8747 dodi 16 0 6420 1388 876 R 1 0.0 0:00.97 top 8283 dodi 16 0 6000 1456 660 S 0 0.0 0:00.11 tcsh 8307 dodi 16 0 5288 936 328 S 0 0.0 0:00.00 pbs_demux 8424 dodi 16 0 6020 1456 644 S 0 0.0 0:00.11 456689.abem.SC 8585 dodi 16 0 39784 6800 1136 S 0 0.0 0:00.02 python2.3 Imaginations unbound
4. Debug with gdbwhere.pl and ssh_pbs.pl ssh_pbs.pl 456689 "~consult/debug/gdbwhere.pl" > mygdb.out & (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/CommonDoc/gdbwhere.html) Imaginations unbound
mygdb.out abe0236: PROCESS ID: 8610 Using host libthread_db library "/usr/local/lib64/tls/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 182920302432 (LWP 8610)] [New Thread 1084229984 (LWP 8623)] 0x0000002a95e0fee0 in PMPI_Comm_rank () from /usr/local/mvapich2-0.9.8p2patched-intel-ofed-1.2/lib/libmpich.so #0 0x0000002a95e0fee0 in PMPI_Comm_rank () from /usr/local/mvapich2-0.9.8p2patched-intel-ofed-1.2/lib/libmpich.so #1 0x00000000023b433f in armci_util_spin (n=1140850688, notused=0x7fbfffc094) at message.c:225 #2 0x000000000238e3b4 in armci_util_wait_int () #3 0x00000000023b3a51 in armci_smp_bcast (x=0x44000000, n=-1073758060, root=1) at message.c:565 #4 0x00000000023b3c83 in armci_msg_bcast (buf=0x44000000, len=-1073758060, root=1) at message.c:682 #5 0x00000000021cfb68 in ga_brdcst_ () #6 0x000000000093b65d in rtdb_broadcast () #7 0x000000000093bb38 in rtdb_get () #8 0x000000000093b0fa in rtdb_get_ () Imaginations unbound
message.c . . . /*\ busy wait * n represents number of time delay units * notused is useful to fool compiler by passing address of sensitive variable \*/ #define DUMMY_INIT 1.0001 double _armci_dummy_work=DUMMY_INIT; void armci_util_spin(int n, void *notused) { int i; for(i=0; i<n; i++) if(armci_msg_me()>-1) _armci_dummy_work *=DUMMY_INIT; if(_armci_dummy_work>(double)armci_msg_nproc())_armci_dummy_work=DUMMY_INIT; } /***************************Barrier Code*************************************/ void armci_msg_barr_init(){ "message.c" 2017 lines --10%-- 225,20 10% Imaginations unbound