70 likes | 221 Views
Using hpc. Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University. What is hpc. System IBM RS/6000 SP, Aix 4.3.3 9 nodes and 16 processors per node 144 Gbyte memory, 3TByte LoadLeveler & Poe LoadLeveler is recommanded hpc.snu.ac.kr Connect by telnet, ssh, rsh
E N D
Using hpc Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University
What is hpc • System • IBM RS/6000 SP, Aix 4.3.3 • 9 nodes and 16 processors per node • 144 Gbyte memory, 3TByte • LoadLeveler & Poe • LoadLeveler is recommanded • hpc.snu.ac.kr • Connect by telnet, ssh, rsh • Teratem is available at http://hpc.snu.ac.kr/download/ttermp23.zip
System Setting & Using • Bourne shell • ksh(default), bash • Use export instead of setenv • General step of using • Edit cmd file • Compile source file • Submit machine code into the machine
Command file #!/bin/ksh # @ job_type = parallel # @ executable = ~/KISA/LLL/execution # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /u/dcslab # @ notify_user = shahn@arirang.snu.ac.kr # @ class = gold # @ step_name = LLL # @ notification = complete # @ checkpoint = no # @ restart = no # @ requirements = (Arch == "R6000") && (OpSys == "AIX43") # @ node = 4 # @ total_tasks = 15 # @ network.MPI = css0,shared,US,high # @ queue
Running example [sp01: ~/KISA/LLL] $ mpcc parallel_allswap.c [sp01: ~/KISA/LLL] $ mv a.out execution [sp01: ~/KISA/LLL] $ llsubmit lll.cmd llsubmit: The job "sp01.8681" has been submitted. [sp01: ~/KISA/LLL] $ llstatus Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys sp01 Avail 12 11 Idle 0 18.49 3 R6000 AIX43 sp02 Avail 1 1 Run 4 55.60 325 R6000 AIX43 sp03 Avail 0 0 Run 21 18.02 9999 R6000 AIX43 sp04 Avail 0 0 Run 16 12.23 9999 R6000 AIX43 sp05 Avail 0 0 Run 21 21.23 9999 R6000 AIX43 sp06 Avail 0 0 Run 16 9.00 9999 R6000 AIX43 sp07 Avail 0 0 Run 22 22.04 7200 R6000 AIX43 sp08 Avail 0 0 Run 6 2.02 9999 R6000 AIX43 sp09 Avail 0 0 Run 17 13.05 9999 R6000 AIX43 R6000/AIX43 9 machines 13 jobs 123 running Total Machines 9 machines 13 jobs 123 running The Central Manager is defined on sp02 All machines on the machine_list are present. [sp01: ~/KISA/LLL] $
llq [sp01: ~/KISA/LLL] $ llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ sp01.8615.0 mrdlab1 9/14 03:04 R 50 long sp02 sp01.8648.0 spscs 9/15 16:16 R 50 silver sp05 sp01.8649.0 spscs 9/15 16:16 R 50 silver sp07 sp02.1291.0 flowsys1 9/15 17:00 R 50 silver sp04 sp01.8652.0 seongkim 9/15 22:37 R 50 gold sp06 sp01.8663.0 shinkj 9/16 12:11 R 50 gold sp04 sp01.8665.0 janggrp 9/16 12:28 R 50 gold sp09 sp01.8666.0 janggrp 9/16 12:28 R 50 gold sp03 sp01.8671.0 biosys 9/16 15:26 R 50 silver sp03 sp01.8678.0 hpcb0011 9/16 16:53 R 50 silver sp03 sp01.8679.0 microsys 9/16 17:25 R 50 silver sp08 sp01.8680.0 microsys 9/16 17:25 R 50 silver sp08 sp01.8681.0 dcslab 9/16 19:06 ST 50 gold sp08 13 job steps in queue, 0 waiting, 1 pending, 12 running, 0 held
llclass & llcancel • llclass [sp01: ~/KISA/LLL] $ llclass Name MaxJobCPU MaxProcCPU Free Max Description d+hh:mm:ss d+hh:mm:ss Slots Slots gold -1 -1 52 112 Serial & parallel batch job silver -1 -1 68 112 Serial & parallel batch job long -1 -1 12 16 Long time job general -1 -1 16 16 Test or Interactive job • llcancel • When cancel one or more jobs from the Loadleveler queue