1 / 46

Computing Workshop for Users of NCAR’s SCD machines

Computing Workshop for Users of NCAR’s SCD machines. Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006. ML Mesa Lab, Chapman Room video conference facilities: FL EOL Atrium and CG1 3150. Overview. Current machine architectures at NCAR (SCD)

hagen
Download Presentation

Computing Workshop for Users of NCAR’s SCD machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video conference facilities: FL EOL Atrium and CG1 3150

  2. Overview • Current machine architectures at NCAR (SCD) • Some basics on parallel computing • Batch queuing systems at NCAR • GAU resources & how to obtain a GAU account • Insights into GAU charges • The Mass Storage System • How to monitor the GAUs • Some practical tips on benchmarks, debugging tools, restarts… • ???

  3. Computer architectures • SCD’s machines are UNIX-based parallel computing architectures • Two types: • Hybrid(shared and distributed memory) machines like bluesky (IBM Power4) bluevista (IBM Power5)lightning(AMD Opteron system running Linux) • Shared memory system liketempest (SGI, 128 CPUs), predominantly used for post-processing jobs

  4. Parallel Programming • Parallel machines require parallel programming techniques in the user application: • MPI (Message Passing Interface) for distributed memory systems, can also be used on shared memory systems • OpenMP for shared memory systems • Hybrid (MPI & OpenMP) programming technique common on the IBMs at NCAR • Pure MPI parallelization often the fastest option, computational domain is split into pieces that can communicate over the network (via messages) • OpenMP: Parallelization of (mostly) loops via compiler directives • Parallelization provided in CAM/CCSM/WRF

  5. Most common: Hybrid hardware architectures • Combined shared and distributed memory architecture: • Shared-memory symmetric multiprocessor (SMP) nodes,processors on a node have direct access to memory • Nodes are connected via the network (distributed memory)

  6. MPI example Processors communicate via messages

  7. MPI Example • Initialize & finalize MPI in your program via function/subroutine calls to the MPI library. Examples include:MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize Example fromprevious pagein C notation(unoptimized): Important to note: such an operation (computing a global sum) is very common, therefore MPI provides a highly optimized function, also called a ‘reduction operation’ MPI_Reduce (…) that can replace the example above

  8. Example: domain decompositions for MPI Each color presentsa processor

  9. OpenMP Example Parallel loops via compiler directives (here: in Fortran notation) Before program is called set: setenv OMP_NUM_THREADS #proc Add compiler directives in your code: !$OMP PARALLEL DO DO i = 1, n a(i) = b(i) + c(i) END DO !$OMP END PARALLEL DO master thread team master thread Assume n=1000 & #proc=4: The loop will be split into 4 ‘threads’ that run in parallel with loop indices 1…250, 251…500, 501…750, 751…1000

  10. SCD’s machines • Bluesky (web page) • ‘Oldest’ machine at NCAR (2002) • Lots of user experience at NCAR, easy access to help • CAM/CCSM/WRF are set up for this architecture (Makefiles) • Batch queuing system LoadLeveler, short interactive runs possible • Batch queues are listed under http://www.cisl.ucar.edu/computers/bluesky/queue.charge.html • Lots of additional software available: e.g. math libraries, graphics packages, Totalview debugger

  11. SCD’s machines • Bluevista (web page) • Newest machine on the floor (Jan. 2006) • CAM/CCSM/WRF are (probably) set up for this architecture • Batch queuing system LSF (Load Sharing Facility) • Queue names different from bluesky: premium, regular, economy, standby, debug, sharehttp://www.cisl.ucar.edu/computers/bluevista/queue.charge.html • Some additional software available: e.g. math libraries, Totalview debugger

  12. SCD’s machines • Lightning (web page) • Linux cluster • Compilers different from the IBMs:Portland Group or Pathscale • Batch queuing system LSF • Same queue names as bluevista • Some support software • Tempest (web page) • for data post-processing with yet another batch queuing system NQS • Lots of support software • Interactive use possible

  13. Example of a LoadLeveler job script Parallel job with 32 MPI processes, com_reg32 queue (32-way node) regular queue #@ class = com_rg32 #@ node = 1 #@ tasks_per_node = 32 #@ output = out.$(jobid) #@ error = out.$(jobid) #@ job_type = parallel #@ wall_clock_limit = 00:20:00 #@ network.MPI = csss,not_shared,us #@ node_usage = not_shared #@ account_no = 54042108 #@ ja_report = yes #@ queue… setenv OMP_NUM_THREADS 1… 32 MPI processesper 32-way node Submit the job via: llsubmit job_script

  14. Example of a LoadLeveler job script Hybrid parallel job with 8 MPI processes and 4 OpenMP threads economy queue #@ class = com_ec32 #@ node = 1 #@ tasks_per_node = 8 #@ output = out.$(jobid) #@ error = out.$(jobid) #@ job_type = parallel #@ wall_clock_limit = 00:20:00 #@ network.MPI = csss,not_shared,us #@ node_usage = not_shared #@ account_no = 54042108 #@ ja_report = yes #@ queue … setenv OMP_NUM_THREADS 4… 8 MPI processesper 32-way node Submit the job via: llsubmit job_script

  15. Example of an LSF job script (lightning) Parallel job with 8 MPI processes (on 4 2-way nodes) #! /bin/csh ## #BSUB -a 'mpich_gm' #BSUB -P 54042108 #BSUB -q regular #BSUB -W 00:30 #BSUB -x #BSUB -n 8 #BSUB -R "span[ptile=2]" #BSUB -o fvcore_amr.out.%J #BSUB -e fvcore_amr.err.%J #BSUB -J test0.path ## mpirun.lsf -v ./dycore select on lightning regular queue wallclock limit 30 min 8 MPI processes (total) 2 MPI processes per node name of the job (listedin the SCD Portal) Submit the job via: bsub < job_script

  16. Example of an LSF job script (bluevista) Parallel job with 8 MPI processes (on 1 8-way node) #! /bin/csh ## #BSUB -a poe #BSUB -P 54042108 #BSUB -q economy #BSUB -W 00:30 #BSUB -x #BSUB -n 8 #BSUB -R "span[ptile=8]" #BSUB -o fvcore_amr.out.%J #BSUB -e fvcore_amr.err.%J #BSUB -J test0.path ## mpirun.lsf -v ./dycore select ‘poe’ on bluevista economy queue exclusive use (not shared) Allows up to 8 MPI processes on a node Submit the job via: bsub < job_script

  17. More information on SCD’s machines • Web page: SCD’s Support and Consulting services • SCD’s costomer support sometimes you even get help on the weekends or in the evenings • Email: consult1@ucar.edu • Phone: 303 497 1278 • Walk-in support at the Mesa Lab • Check out SCD’s Daily Bulletin (scheduled machine downtimes, etc.) • Subscribe to the hpcstatus mailing list (short e-mails about machine status, system updates)

  18. GAU resources • ASP has a monthly allocation of 3850 GAUs (General Accounting Units) • A GAU is a measure for some compute time on the supercomputers maintained by NCAR’s Scientific Computing Division (SCD):http://www.cisl.ucar.edu/ • Access to these machines require • an SCD login account (dbs@ucar.edu or 303-497-1225) • a GAU account (for ASP: contact Maura, otherwise contact your division / apply for a university account) • ssh environment • and a crypto card (for secure access) • SCD contacts: Dick Valent & Mike Page (here today), Juli Rew, Siddhartha Gosh, Ginger Caldwell (GAUs)

  19. GAU resources • GAUs: Use it or lose it - strategy • In ASP: We share the resource among the ASP postdocs & graduate fellows • Distribution is flexible and will be discussed occasionally, e.g. monthly, either via meetings or e-mail discussions: email: asp-gau-users@asp.ucar.edu • GAUs are also charged for • storing files in the Mass Storage System (MSS) • file transfers from MSS to other machines

  20. ASP GAU account • ASP GAU account number: 54042108(also project number) • Needs to be specified in the batch job scripts • ASP account number is not your default account number • Therefore: everybody needs a second (default) GAU account: • divisional GAU account • so-called University account (small request form for 1500 GAUs http://www.cisl.ucar.edu/resources/compServ.html)these GAUs do not expire every month, one-time allocation • Second GAU account should be used for the accumulating MSS charges • automatic when using CAM / CCSM’s MSS option

  21. GAU charges on SCD’s supercomputers • You are charged GAUs for how much time you use a processor (on bluesky, bluevista, lightning, tempest) • On bluesky, there are actually two formulas: • Shared-node usage:GAUs charged = CPU hours used  computer factor  class charging factor • Dedicated-node usage:GAUs charged = wallclock hours used  number of nodes used  number of processors in that node computer factor  class charging factor Slides on GAU charges: Modified from an earlier presentation by George Bryan, NCAR MMM

  22. “Number of nodes used” and“Number of processors in that node” • Self explanatory (?) • Bluesky: • 76 8-way (processors) nodes • 25 32-way (processors) nodes • Bluevista: • 78 8-way (processors) nodes • Lightning • 128 2-way (processors) nodes

  23. “CPU hours used” and “Wallclock hours used” • Measure of how long you “used” a processor • NOTE: This includes all time you were allocated the use of a processor, whether you actually used it or not • Example: you used two 8-processor nodes on bluesky. The job started at 1:00 PM and finished at 2:30 PM. You are charged for 1.5 hrs

  24. “Computer factor” • A measure of how powerful a computer is • Bluesky: 0.24 • Bluevista: 0.5 • Lightning: 0.34 • This “levels the playing field”

  25. “Class charging factor” • Tied to queuing system: “How quickly do you want your results, and how much are you willing to pay for it?” • Current setting on all SCD supercomputers: • Premium = 1.5 (highest priority, fastest turnaround) • Regular = 1.0 • Economy = 0.5 • Standby = 0.1 (lowest priority, slow turnaround)

  26. Example • Recall dedicated-node usage on bluesky • GAUs charged = wallclock hours used  number of nodes used  number of processors in that node  computer factor  class charging factor • 1.5 hours using two 8-processor nodes • Bluesky regular queue • GAUs used = 1.5  2  8  0.24  1.0 = 5.76 GAUs • In premium queue, this would be 8.64 GAUs • In standby queue, this would be 0.576 GAUs

  27. Recommendations: Queuing systems • Check the queue before you submit any job: • If the queue is not busy, try using the standby or economy queues • The queue tends to be “emptier” evenings, weekends, and holidays • Job will start sooner when specifying a wallclock limit in the job script (scheduler tries to ‘squeeze in’ short jobs) • The less processors you request, the sooner you start • Use the premium queue sparingly • Short debug jobs (there is also a special debug queue on lightning) • When that conference paper is due

  28. Recommendations: # of processors vs. run times • If you are using more processors, you might wait longer in the queue, but usually the actual runtime of your job is reduced • Caveat: it usually costs more GAUs • Example: you run the same job, but using • Using 8 processors, the job ran in 24 hours • Using 64 processors, the job ran in 4 hours • 1st example used 46 GAUs • 2nd example used 61 GAUs

  29. The Mass Storage System • MSS: Mass storage system (disks and cartridges) for your big data sets • MSS connected to the SCD machines, sometimes also to divisional computers • MSS user have directories like mss:/LOGIN_NAME/ • Quick online reference (mss commands):http://www.cisl.ucar.edu/docs/mss/mss-commandlist.html • You are charged GAUs for using the MSS • The GAU equation for MSS is more complicated ....

  30. MSS Charges • GAUs charged = .0837  R + .0012  A + N  (.1195  W + .2050  S) • where: • R = Gigabytes read • W = Gigabytes created or written • A = Number of disk drive or tape cartridge accesses • S = Data stored, in gigabyte-years • N = Number of copies of file: 1 if economy reliability selected; 2 if standard reliability selected

  31. Recommendations: The MSS • MSS charges seem small, but they add up! • Examples: FY04 MSS usage • ACD: 24,000 of 60,000 GAUs • CGD: 94,500 of 181,000 GAUs • HAO: 22,000 of 122,000 GAUs • MMM: 34,000 of 139,000 GAUs • RAP: 32,000 of 35,000 GAUs

  32. Recommendations: The MSS • Recommendation for ASP users: • use an account in your home division or your so-called ‘university’ account (1500 GAUs for postdocs, you need to apply) for MSS charges – leave ASP GAUs for supercomputing

  33. GAU Usage Strategy: 30-day and 90-day averages • The allocation actually works through 30-day and 90-day averages • Limits: 120% for 30-day use 105% for 90-day use • It is helpful to spread usage out evenly • How to check GAU usage: • Type “charges” on command line of a supercomputer • Check the “daily summary” output (next page) • SCD Portal: look for the link on SCD’s main page: http://www.cisl.ucar.edu/

  34. Web page: http://www.cisl.ucar.edu/dbsg/dbs/ASP/ ASP 30 Day Percent = 57.0 % ASP 90 Day Percent = 48.3 %30 Day Allocation = 3850 90 Day Allocation = 1155030 Day Use = 2193 90 Day Use = 5575 90 DAY ST -- 30 DAY ST -- LAST DAY 01-NOV-05 31-DEC-05 29-JAN-06 ASP Gaus Used by Day 01-NOV-05 9.3603-NOV-05 .03 04-NOV-05 141.45… 22-JAN-06 .0423-JAN-06 44.29 24-JAN-06 170.83 25-JAN-06 120.30 26-JAN-06 91.67 27-JAN-06 41.97 28-JAN-06 15.59 29-JAN-06 16.95

  35. What happens when we use too many GAUs? • Your jobs will be thrown into a very low priority: the dreaded hold queue • It will be hard to get work done • But, jobs will still run • ASP Users: You can use more than 3850 GAUs / month • Experience says, it’s better to use too many than not enough

  36. What happens when we use too many/too few GAUs? Too many: • Recommendation: when the 30- and 90-day averages are running high, use the economy or standby queue ... conserve GAUs • But, don’t worry about going overToo few: • ASP’s allocation will be cut in the long run if the 3850 GAUs per month allocation is not used

  37. How to catch up when behind • Be wasteful: • Use the premium queue • Use more processors than you need • Have fun: • Try something you always wanted to do, but never had the resources

  38. How to conserve GAUs • Be frugal: • Use the economy and standby queues • Use fewer processors • Use divisional GAUs (if possible) or your ‘university’ GAU account

  39. How to share & monitor GAUs in ASP • Communicate! • Occasionally, we (ASP postdocs) use the e-mail list:asp-gau-users@asp.ucar.edu to announce a ‘busy’ GAU period • Keep watching the ASP GAU usage on the webpage http://www.cisl.ucar.edu/dbsg/dbs/ASP/or in the SCD Portal • Look for the SCD Portal link on the SCD page:http://www.cisl.ucar.edu/

  40. SCD Portal • Online tool that helps you monitor the GAU charges and the current machine status (e.g. batch queues), display can be customized • Information on the machine status requires a setup-command on roy.scd.ucar.edu via the crypto-card access, just enter ‘scdportalkey hostname’ (e.g. lightning) after logging on with the crypto-card • At this time (Jan/31/2006) the GAU charges on bluevista are not itemized: will be included in the next release in Spring 2006

  41. Other IBM resources • Sources of information on the IBM machines bluesky (from the command line), batchview also works on bluevista & lightning • batchview for overview of jobs with their rankings • llq for list of all submitted jobs, no ranking • spinfo : queue limits, memory quotas on home file system and the temporary file system /ptmp • Useful IBM LoadLeveler keywords in the script:#@account_no=54042108 -> ASP account #@ja_report=yes -> job report (see example on the next page) • Useful LoadLeveler commands: llsubmit script_file, llcancel job_id

  42. Example: IBM Job Report • If selected, one email per job is sent to you at midnight, Output on the IBM machines, here blackforest (meanwhile decommisioned): Job Accounting - Summary Report =============================== Operating System : blackforest AIX51 User Name (ID) : cjablono (7568) Group Name (ID) : ncar (100) Account Name : 54042108 Job Name : bf0913en.26921 Job Sequence Number : bf0913en.26921 Job Starts : 12/20/04 17:56:33 Job Ends : 12/20/04 23:26:34 Elapsed Time (Wall-Clock * #CPU): 633632 s Number of Nodes (not_shared) : 8 Number of CPUs : 32 Number of Steps : 1                                          

  43. IBM Job Report (continued) Charge Components Wall-clock Time : 5:30:01 Wall-clock CPU hours : 176.00889 hrs Multiplier for com_ec Queue : 0.50 Charge before Computer Factor : 88.00444 GAUs Multiplier for computer blackforest: 0.10 Charged against Allocation : 8.80044 GAUsProject GAUs Allocated : 5000.00 GAUs Project GAUs Used, as of 12/16/04:1889.20 GAUs Division GAUs 30-Day Average : 103.3% Division GAUs 90-Day Average : 58.6%

  44. How to increase the efficiency • Get a feel for the GAUs for long jobs:benchmarkthe application on target machine • Run a short but relevant test problem and measure the run time (wall clock time) via MPI commands (function MPI_WTIME) or UNIX timing commands like time or timex (output formats are shell-script dependent) • Vary number of processors to assess the scaling • If application scales poorly, avoid using a large number of processors (waste of GAUs), instead use smaller number with numerous restarts • Make sure your job fits into the queue (finishes before the max. time is up) • Use compiler options, especially the optimization options • In case of programming problems: the Totalview debugger can save you days, weeks or even monthson the IBM’s: compile your program with the compiler options:-g -qfullpath-d

  45. Restarts • Restart files are important for long simulations • Queue limits are up to 6 wallclock hours (hard limit, job fails afterwards), then a restart becomes necessary • Get information on the queue limits (SCD web page) and select the job’s integration time accordingly • Restarts built into CAM/CCSM/WRF, must only be activated • Restarts for other user applications must probably be programmed

  46. Questions ?

More Related