180 likes | 335 Views
Basic High Performance Computing. Kenton McHenry. XSEDE. Extreme Science and Engineering Discovery Environment http://www.xsede.org Collection of networked supercomputers PSC Blacklight NCSA Forge SDSC Gordon SDSC Trestles NICS Kraken TACC Lonestar TACC Ranger Purdue Steele. XSEDE.
E N D
Basic High Performance Computing Kenton McHenry
XSEDE • Extreme Science and Engineering Discovery Environment • http://www.xsede.org • Collection of networked supercomputers • PSC Blacklight • NCSA Forge • SDSC Gordon • SDSC Trestles • NICS Kraken • TACC Lonestar • TACC Ranger • Purdue Steele
XSEDE • Extreme Science and Engineering Discovery Environment • http://www.xsede.org • Collection of networked supercomputers • Supported by NSF • Follow up to TeraGrid • NCSA Ember • …
Allocations • Startups • Around 30,000 CPU hours • For experimentation • Can apply any time per year • Only 1 such allocation per user • Research • 1 million+ CPU hours • Research plan • Can apply for only during certain periods in the year • Very competitive • Humanities related work makes up a very small amount of those given out
ECS • Extended Collaborative Support Services • Time from XSEDE support staff • Ask for in allocation request • Must justify
Logging In • Linux • SSH • ember.ncsa.illinois.edu • Head node vs. worker nodes
Space • Local scratch • Temporary space during a programs execution • Cleared as soon as the process finishes • Global scratch • Temporary user space • Untouched files are cleared periodically (e.g. weeks) • Mass store • Long terms storage • Tapes
Executing Code • Naively or Embarrassingly Parallel • Problem allows for a number of independent tasks that can be executed separately from one another • No special steps needed to synchronize steps or merge results • e.g. MPI or Map Reduce
Executing Code • Step 1: Write your code on a non-HPC resource • For the Census project this involved months of research and development • Construct to have only a command line interface • Support flags for: • Setting input data (either folder or database) • Setting output location (either folder or database) • Customizing the execution and/or selected a desired step • We had 3 steps
Executing Code • Step 1: Write your code on a non-HPC resource • Step 2: Organize data • Perhaps subfolders for each job • Move to global scratch space to avoid GridFS bottlenecks
Executing Code • Step 1: Write your code on a non-HPC resource • Step 2: Organize data • Step 3: Create scripts to execute jobs • Scripts • Portable Batch System (PBS) • [Example]
Executing Code • Step 1: Write your code on a non-HPC resource • Step 2: Organize data • Step 3: Create scripts to execute jobs • Step 4: Run scripts
Execute $ qsub00889.pbs This job will be charged to account: abc267950.ember $ for f in *.pbs; do qsub$f; done
Monitor $ qstat Job id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----267794.ember v15 ccguser 75:11:48 R gridchem267795.ember v16 ccguser 75:09:20 R gridchem267796.ember v17 ccguser 75:13:01 R gridchem267870.ember c4-ts1-freq ccguser 279:03:2 R gridchem267872.ember c5-ts1-freq ccguser 351:17:0 R gridchem267873.ember c5-ts1-ccsd ccguser 228:50:0 R gridchem267897.ember c3-ts1-ccsdt ccguser 267:04:0 R gridchem267912.ember FSDW103lnpvert kpatten 2178:07: R normal 267943.ember jobDP12 haihuliu 1506:40: R normal 267944.ember PF31 haihuliu 920:44:4 R normal 267945.ember jobDP8 haihuliu 1351:11: R normal 267946.ember FLOOArTSre2.com ccguser 91:32:30 R gridchem267947.ember FLOOArTSre3.com ccguser 86:29:35 R gridchem267949.ember vHLBIHl1O5 ccguser 01:23:03 R normal 267950.ember S-00889 kooper 00:00:00 R normal
Results $ qstat-f 267950.ember Job Id: 267950.emberJob_Name = S-00889Job_Owner = kooper@ember.ncsa.illinois.eduresources_used.cpupercent = 396resources_used.cput = 00:02:26resources_used.mem = 4981600kbresources_used.ncpus = 12resources_used.vmem = 62051556kbresources_used.walltime = 00:01:02job_state = R queue = normal server = emberAccount_Name = gf7 Checkpoint = nctime = Wed May 30 11:11:33 2012Error_Path = ember.ncsa.illinois.edu:/u/ncsa/kooper/scratch-global/census/1 940/batch1/segmentation/S-00889.e267950exec_host = ember-cmp1/1*6+ember-cmp1/2*6exec_vnode = (ember-cmp1[11]:ncpus=6:mem=32505856kb)+(ember-cmp1[12]:ncpus= 6:mem=27262976kb)
Questions? Image and Spatial Data Analysis Group http://isda.ncsa.illinois.edu