400 likes | 480 Views
Getting Started on Topsail. Charles Davis ITS Research Computing February 10, 2010. Outline. History of Topsail Structure of Topsail File Systems on Topsail Compiling on Topsail Topsail and LSF. Initial Topsail Cluster. Initially: 1040 CPU Dell Linux Cluster
E N D
Getting Started on Topsail Charles Davis ITS Research Computing February 10, 2010
Outline • History of Topsail • Structure of Topsail • File Systems on Topsail • Compiling on Topsail • Topsail and LSF
Initial Topsail Cluster • Initially: 1040 CPU Dell Linux Cluster • 520 dual socket, single core nodes • Infiniband interconnect • Intended for capability research • Housed in ITS Franklin machine room • Fast and efficient for large computational jobs
Topsail Upgrade 1 • Topsail upgraded to 4,160 CPU • replaced blades with dual socket, quad core • Intel Xeon 5345 (Clovertown) Processors • Quad-Core with 8 CPU/node • Increased number of processors, but decreased individual processor speed (was 3.6 GHz, now 2.33) • Decreased energy usage and necessary resources for cooling system • Summary: slower clock speed, better memory bandwidth, less heat • Benchmarks tend to run at the same speed per core • Topsail shows a net ~4X improvement • Of course, this number is VERY application dependent
Topsail – Upgraded blades • 52 Chassis: Basis of node names • Each holds 10 blades -> 520 blades total • Nodes = cmp-chassis#-blade# • Old Compute Blades: Dell PowerEdge 1855 • 2 Single core Intel Xeon EMT64T 3.6 GHZ procs • 800 Mhz FSB • 2MB L2 Cache per socket • Intel NetBurst MicroArchitecture • New Compute Blades: Dell PowerEdge 1955 • 2 Quad core Intel 2.33 GHz procs • 1333 Mhz FSB • 4MB L2 Cache per socket • Intel Core 2 MicroArchitecture
Topsail Upgrade 2 • Most recent Topsail upgrade (Feb/Mar ‘09) • Refreshed much of the infrastructure • Improved IBRIX filesystem • Replaced and improved Infiniband cabling • Moved cluster to ITS-Manning building • Better cooling and UPS
Current Topsail Architecture • Login node: 8 CPU @ 2.3 GHz Intel EM64T, 12 GB memory • Compute nodes:4,160 CPU @ 2.3 GHz Intel EM64T, 12 GB memory • Shared disk:39TB IBRIX Parallel File System • Interconnect: Infiniband 4x SDR • 64bit Linux Operating System
Multi-Core Computing • Processor Structure on Topsail • 500+ nodes • 2 sockets/node • 1 processor/socket • 4 cores/processor (Quad-core) • 8 cores/node • http://www.tomshardware.com/2006/12/06/quad-core-xeon-clovertown-rolls-into-dp-servers/page3.html
Multi-Core Computing • The trend in High Performance Computing is towards multi-core or many core computing. • More cores at slower clock speeds for less heat • Now, dual and quad core processors are becoming common. • Soon 64+ core processors will be common • And these may be heterogeneous!
The Heat Problem Taken From: Jack Dongarra, UT
More Parallelism Taken From: Jack Dongarra, UT
Infiniband Connections • Connection comes in single (SDR), double (DDR), and quad data rates (QDR). • Topsail is SDR. • Single data rate is 2.5 Gbit/s in each direction per link. • Links can be aggregated - 1x, 4x, 12x. • Topsail is 4x. • Links use 8B/10B encoding —10 bits carry 8 bits of data — useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively. • Data rate for Topsail is 8 GB/s (4x SDR).
Infiniband Benchmarks • Point-to-point (PTP) intranode communication on Topsail for various MPI send types • Peak bandwidth: • 1288 MB/s • Minimum Latency (1-way): • 3.6 ms
Infiniband Benchmarks • Scaled aggregate bandwidth for MPI Broadcast on Topsail • Note good scaling throughout the tested range (from 24-1536 cores)
Login to Topsail • Use ssh to connect: • ssh topsail.unc.edu • SSH Secure Shell with Windows • For using interactive programs with X-Windows Display: • ssh –X topsail.unc.edu • ssh –Y topsail.unc.edu • Off-campus users (i.e. domains outside of unc.edu) must use VPN connection
Topsail File Systems • 39TB IBRIX Parallel File System • Split into Home and Scratch Space • Home: /ifs1/home/my_onyen • Scratch: /ifs1/scr/my_onyen • Mass Storage • Only Home is backed up • /ifs1/home/my_onyen/ms
File System Limits • 500GB Total Limit per User • Home – 15GB limit for Backups • Scratch: • No limit except 500GB total • Not backed up • Periodically cleaned • Few installed packages/programs
Compiling on Topsail • Modules • Serial Programming • Intel Compiler Suite for Fortran77, Fortran90, C and C++ - Recommended by Research Computing • GNU • Parallel Programming • MPI • OpenMP • Must use Intel Compiler Suite • Compiler tag: -openmp • Must set OMP_NUM_THREADS in submission script
Compiling Modules • Module commands • module – list commands • module avail – list modules • module add – add module temporarily • module list – list modules being used • module clear – remove module temporarily • Add module using startup files
Available Compilers • Intel – ifort, icc, icpc • GNU – gcc, g++, gfortran • Libraries - BLAS/LAPACK • MPI: • mpicc/mpiCC • mpif77/mpif90 • mpixx is just a wrapper around the Intel or GNU compiler • Adds location of MPI libraries and include files • Provided as a convenience
Test MPI Compile • Copy cpi.c to scratch directory: • cp /ifs1/scr/cdavis/Topsail/cpi.c /ifs1/scr/my_onyen/. • Add Intel module: • module load hpc/mvapich-intel-11 • Confirm Intel module: • which mpicc • Compile code: • mpicc –o cpi cpi.c
MPI/OpenMP Training • Courses are taught throughout year by Research Computing http://learnit.unc.edu/workshops • Next course: • MPI – Summer • OpenMP – March 3rd
Running Programs on Topsail • Upon ssh to Topsail, you are on the Login node. • Programs SHOULD NOT be run on Login node. • Submit programs to one of 4,160 Compute nodes. • Submit jobs using Load Sharing Facility (LSF).
Job Scheduling Systems • Allocates compute nodes to job submissions based on user priority, requested resources, execution time, etc. • Many types of schedulers • Load Sharing Facility (LSF) – Used by Topsail • IBM LoadLeveler • Portable Batch System (PBS) • Sun Grid Engine (SGE)
other hosts other hosts Execution host Submission host Master host 3 LIM LIM MLIM Load information 4 2 5 SBD MBD Batch API 11 8 9 Child SBD 1 7 6 queue 12 10 bsub app RES 13 LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server User job Load Sharing Facility (LSF)
Submitting a Job to LSF • For a compiled MPI job: • bsub -n "< number CPUs >" -o out.%J -e err.%J -a mvapich mpirun ./mycode • bsub – LSF command that submits job to compute node • bsub –o and bsub -e • Job output saved to file in submission directory
Queue System on Topsail • Topsail uses queues to distribute jobs. • Specify queue with –q in bsub: • bsub –q week … • No –q specified = default queue (week) • Queues vary depending on size and required time of jobs • See listing of queues: • bqueues
Topsail Queues • Most jobs do not scale very well over 128 cpu.
Submission Scripts • Easier to write submission script that can be edited for each job submission. • Example script file – run.hpl: #BSUB -n "< number CPUs >" #BSUB -e err.%J #BSUB -o out.%J #BSUB -a mvapich mpirun ./mycode • Submit with: bsub < run.hpl
More bsub options • bsub –x NO LONGER USE!!!! • Exclusive use of a node • Use extensively when first testing code • bsub –n 4 –R span[ptile=4] • Forces all 4 processors to be on same node • Similar to –x • bsub –J job_name • see man pages for a complete description • man bsub
Performance Test • Gromacs MD simulation of bulk water • Simulation setups: • Case 1: -n 8 -R span[ptile=1] • Case 2: -n 8 -R span[ptile=8] • Simulation times (1ns MD): • Case 1: 1445 sec • Case 2: 1255 sec • Using 1 node only improved speed by 13%
Following Job After Submission • bjobs • bjobs –l JobID • Shows current status of job • bhist • bhist –l JobID • More details information regarding job history • bkill • bkill –r JobID • Ends job prematurely
Submit Test MPI Job • Submit the test MPI program on Topsail • bsub –q week –n 4 –o out.%J –e err.%J –a mvapich mpirun ./cpi • Follow submission: bjobs • Output stored in out.%J file
Pre-Compiled Programs on Topsail • Some applications are precompiled for all users: • /ifs1/apps • Amber, Gaussian, Gromacs, NetCDF, NWChem, R • Add module to path using module commands: • module list – shows available applications • module add – add specific application • Once module command is used, executable is added to the full path
Test Gaussian Job on Topsail • Add Gaussian Application to path: • module add apps/gaussian-03e01 • module list • Copy input com file: • cp /ifs1/scr/cdavis/Topsail/water.com . • Check that executable has been added to path: • echo $PATH • Submit job: • bsub –q week –n 4 –e err.%J –o out.%J g03 water.com
Common Error 1 • If job immediately dies, check err.%J file • err.%J file has error: • Can't read MPIRUN_HOST • Problem: MPI enivronment settings were not correctly applied on compute node • Solution: Include mpirun in bsub command
Common Error 2 • Job immediately dies after submission • err.%J file is blank • Problem: ssh passwords and keys were not correctly setup at initial login to Topsail • Solution: • cd ~/.ssh/ • mv id_rsa id_rsa-orig • mv id_rsa.pub id_rsa.pub-orig • Logout of Topsail • Login to Topsail and accept all defaults
Interactive Jobs • To run long shell scripts on Topsail, use int queue • bsub –q int –Ip /bin/bash • This bsub command provides a prompt on compute node • Can run program or shell script interactively from compute node • Totalview debugger can also be run interactively from Topsail
Further Help with Topsail • More details about using Topsail can be found on the Getting Started on Topsail help document • http://help.unc.edu/?id=6214 • http://keel.isis.unc.edu/wordpress/ - ON CAMPUS • For assistance with Topsail, please contact the ITS Research Computing group • Email: research@unc.edu • For immediate assistance, see manual pages on Topsail: • man <command>