1 / 40

Getting Started on Topsail

Getting Started on Topsail. Charles Davis ITS Research Computing February 10, 2010. Outline. History of Topsail Structure of Topsail File Systems on Topsail Compiling on Topsail Topsail and LSF. Initial Topsail Cluster. Initially: 1040 CPU Dell Linux Cluster

Download Presentation

Getting Started on Topsail

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting Started on Topsail Charles Davis ITS Research Computing February 10, 2010

  2. Outline • History of Topsail • Structure of Topsail • File Systems on Topsail • Compiling on Topsail • Topsail and LSF

  3. Initial Topsail Cluster • Initially: 1040 CPU Dell Linux Cluster • 520 dual socket, single core nodes • Infiniband interconnect • Intended for capability research • Housed in ITS Franklin machine room • Fast and efficient for large computational jobs

  4. Topsail Upgrade 1 • Topsail upgraded to 4,160 CPU • replaced blades with dual socket, quad core • Intel Xeon 5345 (Clovertown) Processors • Quad-Core with 8 CPU/node • Increased number of processors, but decreased individual processor speed (was 3.6 GHz, now 2.33) • Decreased energy usage and necessary resources for cooling system • Summary: slower clock speed, better memory bandwidth, less heat • Benchmarks tend to run at the same speed per core • Topsail shows a net ~4X improvement • Of course, this number is VERY application dependent

  5. Topsail – Upgraded blades • 52 Chassis: Basis of node names • Each holds 10 blades -> 520 blades total • Nodes = cmp-chassis#-blade# • Old Compute Blades: Dell PowerEdge 1855 • 2 Single core Intel Xeon EMT64T 3.6 GHZ procs • 800 Mhz FSB • 2MB L2 Cache per socket • Intel NetBurst MicroArchitecture • New Compute Blades: Dell PowerEdge 1955 • 2 Quad core Intel 2.33 GHz procs • 1333 Mhz FSB • 4MB L2 Cache per socket • Intel Core 2 MicroArchitecture

  6. Topsail Upgrade 2 • Most recent Topsail upgrade (Feb/Mar ‘09) • Refreshed much of the infrastructure • Improved IBRIX filesystem • Replaced and improved Infiniband cabling • Moved cluster to ITS-Manning building • Better cooling and UPS

  7. Current Topsail Architecture • Login node: 8 CPU @ 2.3 GHz Intel EM64T, 12 GB memory • Compute nodes:4,160 CPU @ 2.3 GHz Intel EM64T, 12 GB memory • Shared disk:39TB IBRIX Parallel File System • Interconnect: Infiniband 4x SDR • 64bit Linux Operating System

  8. Multi-Core Computing • Processor Structure on Topsail • 500+ nodes • 2 sockets/node • 1 processor/socket • 4 cores/processor (Quad-core) • 8 cores/node • http://www.tomshardware.com/2006/12/06/quad-core-xeon-clovertown-rolls-into-dp-servers/page3.html

  9. Multi-Core Computing • The trend in High Performance Computing is towards multi-core or many core computing. • More cores at slower clock speeds for less heat • Now, dual and quad core processors are becoming common. • Soon 64+ core processors will be common • And these may be heterogeneous!

  10. The Heat Problem Taken From: Jack Dongarra, UT

  11. More Parallelism Taken From: Jack Dongarra, UT

  12. Infiniband Connections • Connection comes in single (SDR), double (DDR), and quad data rates (QDR). • Topsail is SDR. • Single data rate is 2.5 Gbit/s in each direction per link. • Links can be aggregated - 1x, 4x, 12x. • Topsail is 4x. • Links use 8B/10B encoding —10 bits carry 8 bits of data — useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively. • Data rate for Topsail is 8 GB/s (4x SDR).

  13. Topsail Network Topology

  14. Infiniband Benchmarks • Point-to-point (PTP) intranode communication on Topsail for various MPI send types • Peak bandwidth: • 1288 MB/s • Minimum Latency (1-way): • 3.6 ms

  15. Infiniband Benchmarks • Scaled aggregate bandwidth for MPI Broadcast on Topsail • Note good scaling throughout the tested range (from 24-1536 cores)

  16. Login to Topsail • Use ssh to connect: • ssh topsail.unc.edu • SSH Secure Shell with Windows • For using interactive programs with X-Windows Display: • ssh –X topsail.unc.edu • ssh –Y topsail.unc.edu • Off-campus users (i.e. domains outside of unc.edu) must use VPN connection

  17. Topsail File Systems • 39TB IBRIX Parallel File System • Split into Home and Scratch Space • Home: /ifs1/home/my_onyen • Scratch: /ifs1/scr/my_onyen • Mass Storage • Only Home is backed up • /ifs1/home/my_onyen/ms

  18. File System Limits • 500GB Total Limit per User • Home – 15GB limit for Backups • Scratch: • No limit except 500GB total • Not backed up • Periodically cleaned • Few installed packages/programs

  19. Compiling on Topsail • Modules • Serial Programming • Intel Compiler Suite for Fortran77, Fortran90, C and C++ - Recommended by Research Computing • GNU • Parallel Programming • MPI • OpenMP • Must use Intel Compiler Suite • Compiler tag: -openmp • Must set OMP_NUM_THREADS in submission script

  20. Compiling Modules • Module commands • module – list commands • module avail – list modules • module add – add module temporarily • module list – list modules being used • module clear – remove module temporarily • Add module using startup files

  21. Available Compilers • Intel – ifort, icc, icpc • GNU – gcc, g++, gfortran • Libraries - BLAS/LAPACK • MPI: • mpicc/mpiCC • mpif77/mpif90 • mpixx is just a wrapper around the Intel or GNU compiler • Adds location of MPI libraries and include files • Provided as a convenience

  22. Test MPI Compile • Copy cpi.c to scratch directory: • cp /ifs1/scr/cdavis/Topsail/cpi.c /ifs1/scr/my_onyen/. • Add Intel module: • module load hpc/mvapich-intel-11 • Confirm Intel module: • which mpicc • Compile code: • mpicc –o cpi cpi.c

  23. MPI/OpenMP Training • Courses are taught throughout year by Research Computing http://learnit.unc.edu/workshops • Next course: • MPI – Summer • OpenMP – March 3rd

  24. Running Programs on Topsail • Upon ssh to Topsail, you are on the Login node. • Programs SHOULD NOT be run on Login node. • Submit programs to one of 4,160 Compute nodes. • Submit jobs using Load Sharing Facility (LSF).

  25. Job Scheduling Systems • Allocates compute nodes to job submissions based on user priority, requested resources, execution time, etc. • Many types of schedulers • Load Sharing Facility (LSF) – Used by Topsail • IBM LoadLeveler • Portable Batch System (PBS) • Sun Grid Engine (SGE)

  26. other hosts other hosts Execution host Submission host Master host 3 LIM LIM MLIM Load information 4 2 5 SBD MBD Batch API 11 8 9 Child SBD 1 7 6 queue 12 10 bsub app RES 13 LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server User job Load Sharing Facility (LSF)

  27. Submitting a Job to LSF • For a compiled MPI job: • bsub -n "< number CPUs >" -o out.%J -e err.%J -a mvapich mpirun ./mycode • bsub – LSF command that submits job to compute node • bsub –o and bsub -e • Job output saved to file in submission directory

  28. Queue System on Topsail • Topsail uses queues to distribute jobs. • Specify queue with –q in bsub: • bsub –q week … • No –q specified = default queue (week) • Queues vary depending on size and required time of jobs • See listing of queues: • bqueues

  29. Topsail Queues • Most jobs do not scale very well over 128 cpu.

  30. Submission Scripts • Easier to write submission script that can be edited for each job submission. • Example script file – run.hpl: #BSUB -n "< number CPUs >" #BSUB -e err.%J #BSUB -o out.%J #BSUB -a mvapich mpirun ./mycode • Submit with: bsub < run.hpl

  31. More bsub options • bsub –x NO LONGER USE!!!! • Exclusive use of a node • Use extensively when first testing code • bsub –n 4 –R span[ptile=4] • Forces all 4 processors to be on same node • Similar to –x • bsub –J job_name • see man pages for a complete description • man bsub

  32. Performance Test • Gromacs MD simulation of bulk water • Simulation setups: • Case 1: -n 8 -R span[ptile=1] • Case 2: -n 8 -R span[ptile=8] • Simulation times (1ns MD): • Case 1: 1445 sec • Case 2: 1255 sec • Using 1 node only improved speed by 13%

  33. Following Job After Submission • bjobs • bjobs –l JobID • Shows current status of job • bhist • bhist –l JobID • More details information regarding job history • bkill • bkill –r JobID • Ends job prematurely

  34. Submit Test MPI Job • Submit the test MPI program on Topsail • bsub –q week –n 4 –o out.%J –e err.%J –a mvapich mpirun ./cpi • Follow submission: bjobs • Output stored in out.%J file

  35. Pre-Compiled Programs on Topsail • Some applications are precompiled for all users: • /ifs1/apps • Amber, Gaussian, Gromacs, NetCDF, NWChem, R • Add module to path using module commands: • module list – shows available applications • module add – add specific application • Once module command is used, executable is added to the full path

  36. Test Gaussian Job on Topsail • Add Gaussian Application to path: • module add apps/gaussian-03e01 • module list • Copy input com file: • cp /ifs1/scr/cdavis/Topsail/water.com . • Check that executable has been added to path: • echo $PATH • Submit job: • bsub –q week –n 4 –e err.%J –o out.%J g03 water.com

  37. Common Error 1 • If job immediately dies, check err.%J file • err.%J file has error: • Can't read MPIRUN_HOST • Problem: MPI enivronment settings were not correctly applied on compute node • Solution: Include mpirun in bsub command

  38. Common Error 2 • Job immediately dies after submission • err.%J file is blank • Problem: ssh passwords and keys were not correctly setup at initial login to Topsail • Solution: • cd ~/.ssh/ • mv id_rsa id_rsa-orig • mv id_rsa.pub id_rsa.pub-orig • Logout of Topsail • Login to Topsail and accept all defaults

  39. Interactive Jobs • To run long shell scripts on Topsail, use int queue • bsub –q int –Ip /bin/bash • This bsub command provides a prompt on compute node • Can run program or shell script interactively from compute node • Totalview debugger can also be run interactively from Topsail

  40. Further Help with Topsail • More details about using Topsail can be found on the Getting Started on Topsail help document • http://help.unc.edu/?id=6214 • http://keel.isis.unc.edu/wordpress/ - ON CAMPUS • For assistance with Topsail, please contact the ITS Research Computing group • Email: research@unc.edu • For immediate assistance, see manual pages on Topsail: • man <command>

More Related