1 / 38

HPC XC cluster Slurm and HPC-LSF

HPC XC cluster Slurm and HPC-LSF. Yu-Sheng Guo Account Support Consultant HP Taiwan. SLURM. What is SLURM ?. SLURM : S imple L inux U tility for R esource M anagement Arbitrates requests by managing queue of pending work Allocates access to computer nodes within a cluster

sally
Download Presentation

HPC XC cluster Slurm and HPC-LSF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPC XC cluster Slurm and HPC-LSF Yu-Sheng Guo Account Support Consultant HP Taiwan

  2. SLURM HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  3. What is SLURM ? • SLURM : Simple Linux Utility for Resource Management • Arbitrates requests by managing queue of pending work • Allocates access to computer nodes within a cluster • Launches parallel jobs and manages them (I/O, signals, limits, etc.) • NOT a comprehensive cluster administration or monitoring package • NOT a sophisticated scheduling system • An external entity can manage the SLURM queues via plugin HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  4. SLURM Design Criteria • Simple • Scheduling complexity external to SLURM (LSF, PBS) • Open source: GPL • Fault-tolerant • For SLURM daemons and its jobs • Secure • Restricted user access to compute nodes • System administrator friendly • Simple configuration file • Scalable to thousands of nodes HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  5. SLURM allocates nodes, starts and manages the jobs Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 LCRM Job 1 Job 2 Job 3 Job 4 Users submit work, either to LSF or directly to SLURM SLURM in a Nutshell HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  6. SLURM Architecture • Two daemons • slurmctld - controller, optional backup • slurmd - per node daemon • Five user commands • scontrol - administration tool, get/set configuration • sinfo - reports general system information • squeue - reports job and job step information • srun - submit/initiate job or job step • scancel - signal or cancel a job or job step • Configuration file • /hptc_cluster/slurm/etc/slurm.conf HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  7. SLURM Architecture One daemon per node Cluster-wide control daemon Slurmd Slrumctld (primary) Slurmd Slrumctld (backup) Slurmd Slurmd Slurmd srun sinfo squeue scontrol scancel User and administrator tools HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  8. SLURM : slurmctld • Orchestrates SLURM activities across entire cluster (with optional backup) • Components • Job Manager - manages queue of pending jobs • Node Manager - node state information • Partition Manager - allocates nodes HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  9. SLURM : slurmd • Daemon executing on each compute node • Performs actions as directed by slurmctld and srun • Components • Machine Status • Job Status • Remote Execution • Stream Copy (stdin, stdout, and stderr) • Job Control (signal) HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  10. Display partition and node state $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lsf* up infinite 1 drain HPC13* lsf* up infinite 3 alloc HPC[1-3] lsf* up infinite 72 idle HPC[4-12,14-76] Asterisk after partition name indicates default partition Asterisk after node state indicates it is not responding SLURM : sinfo • Displays node and partition information • Options permit you to filter, sort, and output information in almost any way desired HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  11. Display running and pending jobs $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST 5860 lsf hptclsf@ g03 R 4:54:37 1 HPC2 5680 lsf hptclsf@ g03 R 20:59:18 1 HPC1 5677 lsf hptclsf@ g03 R 22:42:54 1 HPC3 R = Running PD = Pending days:hours:minutes:seconds SLURM : squeue • Displays job and job step information • Options permit you to filter, sort, and output information in almost any way desired HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  12. SLURM : srun • User tool to initiate jobs and job steps • Run jobs interactively • Allocate resources • Submit batch jobs • Attach to currently running job • Launch a set of parallel tasks (job step) • Options to specify resource requirements • Partition, processor count, node count, minimum memory per node, minimum processor count per node, specific nodes to use or avoid, node can be share, etc. HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  13. SLURM : scancel • Send arbitrary signal to a jobs and/or job step • By default, sends SIGKILL terminating job • Filters can be used to specify user, program name, partition, job state, etc Cancel job id 12345 $ scancel 12345 Cancel all jobs belonging to user user1 with interaction $ scancel --interactive --user=user1 Cancel job id=13601 name=summer partition=lsf [y/n]? y Cancel job id=13777 name=NewJob partition=lsf [y/n]? n HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  14. SLURM : scontrol • Administrative tool to set and get configuration information • Can be useful to users who want to see full state information without fancy filtering or formatting $scontrol show partiton lsf PartitionName=lsf TotalNodes=76 TotalCPUs=304 RootOnly=NO Default=YES Shared=FORCE State=UP MaxTime=UNLIMITED Hidden=NO MinNodes=1 MaxNodes=UNLIMITED AllowGroups=(null) Nodes=HPC[1-76] NodeIndecies=0,75,-1 $scontrol ping Slurmctld(primary/backup) at HPC94/HPC95 are UP/UP HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  15. SLURM : Configuration File • Location : /hptc_cluster/slurm/etc/slurm.conf • Define per-node resource configuration • NodeName=HPC[1-76] Procs=4 • Define partition resource • PartitionName=lsf RootOnly=No Shared=FORCE Default=Yes Nodes=HPC[1-76] HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  16. LSF-HPC HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  17. Why LSF? • Powerful Workload Management Tools • Engineering Dept use 7am-7pm M-F; QA Dept use nights and weekends • Development work must run 20% of the time; Production work must run 80% of the time • 1,2, and 4 hour jobs all with a random mix of task counts; LSF will efficiently schedule these jobs to ensure maximum use of all available resources • Grid support; LSF MultiCluster capability • Capable of managing all company resources: supports a mix of all types of systems (XC and non-XC hosts) under a single LSF cluster environment HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  18. Why LSF-HPC? • Reduced maintenance and less overhead than standard LSF as the cluster size scales out • The whole XC cluster is condensed down and represented as one node in LSF • Simplifies basic cluster information by consolidating redundant information in a homogeneous cluster • A high-performance cluster solution • Focuses on scheduling and leaves parallel and serial launching to SLURM • Compute nodes have one SLURM compute daemon instead of 4+ LSF daemons • No monitoring overhead – all CPU cycles left for the job HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  19. LSF daemons • LIM: Load Information Manager • Manages licensing and resource information • RES: Remote Execution Server • Controls job execution • SBATCHD: Slave Batch Daemon • Manages job state; enforces load thresholds • MBATCHD: Master Batch Daemon • Responds to user queries; manages queues and scheduler • MBSCHD: Scheduling Daemon • Makes all job scheduling decisions • PIM: Process Information Manager • Process level resource monitoring and accounting HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  20. The Implementation • LSF daemons only run on one node in XC. • This one LSF Execution Host “represents” the whole XC cluster. • lshosts, bhosts, lsload output encapsulates information from all the nodes in the cluster into one “host”. • LSF access on the other XC nodes made available through LSF’s float client licensing. • Every node can submit jobs. • SLURM provides resource information. • One ‘lsf’ partition reserved for use by LSF. • LSF gathers resource information from SLURM HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  21. SLURM – LSF Integration HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  22. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node slurmd slurmd slurmd slurmd slurmd slurmd slurmd slurmd Resource mgmt node LSF Login node SLURM – LSF Integration HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  23. Launching a Job • LSF creates resource allocations for each job via SLURM • Support for specific topology requests available through a SLURM external scheduler • LSF then dispatches the job locally, and the job makes use of the allocation in several ways • ‘srun’ commands distribute the tasks • HPMPI’s ‘mpirun -srun’ command for MPI jobs • Job can use it’s own launch mechanism • LINDA applications (i.e. Gaussian) stick with ssh on XC HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  24. A Sample User Job • Consider a user job ‘./script’ which looks like: #!/bin/sh hostname srun hostname mpirun –srun ./hellompi • bsub –n 4 –ext “SLURM[nodes=4]” ./script • Should include output redirection also (‘-o job.out’) • This job requests 4 processors (-n 4) across 4 nodes (nodes=4) • SLURM will run one task per node HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  25. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node slurmd slurmd slurmd slurmd slurmd slurmd slurmd slurmd Resource mgmt node LSF bsub –n 4 … ./script Login node Submitting a Job to LSF HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  26. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node slurmd slurmd slurmd slurmd slurmd slurmd slurmd slurmd srun –A –p lsf –n 4 … Resource mgmt node LSF Login node Creating the Allocation in SLURM HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  27. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node 123 slurmd 123 slurmd slurmd slurmd slurmd slurmd 123 slurmd 123 slurmd SLURM_JOBID=123 Resource mgmt node LSF Login node Creating the Allocation in SLURM HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  28. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node 123 slurmd slurmd slurmd slurmd slurmd 123 slurmd 123 slurmd 123 slurmd SLURM_JOBID=123 SLURM_NPROCS=4 Execute ./script locally Resource mgmt node LSF Login node Dispatching the User Job Locally HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  29. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node 123 slurmd slurmd slurmd 123 slurmd slurmd slurmd 123 slurmd 123 slurmd hostname Resource mgmt node n36 LSF Login node The First Line of the User Script HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  30. Resource mgmt node slurmctld hostname Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node 123 slurmd 123 slurmd slurmd slurmd slurmd 123 slurmd slurmd 123 slurmd hostname srun Resource mgmt node LSF hostname hostname Login node Making use of the Allocation HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  31. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node n27 slurmd 123 slurmd slurmd slurmd 123 slurmd 123 slurmd 123 slurmd slurmd n26 srun Resource mgmt node LSF n25 n24 Login node Gathering I/O HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  32. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node hellompi 123 slurmd 123 slurmd slurmd slurmd slurmd 123 slurmd slurmd 123 slurmd hellompi mpirun -srun Resource mgmt node LSF hellompi hellompi Login node MPI Making Use of the Allocation HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  33. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node I’m rank 1 of 4 slurmd 123 slurmd slurmd slurmd 123 slurmd 123 slurmd 123 slurmd slurmd I’m rank 2 of 4 mpirun -srun Resource mgmt node I’m rank 3 of 4 LSF I’m rank 4 of 4 Login node Gathering MPI I/O HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  34. Resource mgmt node slurmctld Compute node Compute node Compute node Compute node Compute node Compute node Compute node Compute node 123 slurmd 123 slurmd slurmd slurmd slurmd slurmd 123 slurmd 123 slurmd scancel 123 Resource mgmt node LSF Login node Cleaning Up HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  35. Troubleshooting LSF on XC HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  36. Basic Troubleshooting Tips • If LSF commands fail: • ‘controllsf show’ to determine which node is the LSF Execution Host • Login and check LSF logs in /opt/hptc/lsf/top/log (fix obvious complaints) • Test LSF command on this node (may be firewall issue or float client licensing issue) • Use ‘ps’ command to confirm LSF daemons running on this node (check logs and restart daemons) • Execute ‘lmstat –a’ on the XC head node to confirm that OEM licensing is up and running • Ensure license file in proper location • (Re)start license service on head node HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  37. Basic Troubleshooting Tips • XC LSF Execution Host closed • Check that SLURM is working on the LSF Execution Host • ‘scontrol ping’, ‘sinfo’ commands • Troubleshoot SLURM • Verify that an ‘lsf’ partition exists in SLURM • Lower-case ‘lsf’ as the partition name • Contains nodes that are up and available for use (IDLE state) • Partition is root-only to prevent non-LSF use • Problems with LSF jobs • Compare SLURM’s job knowledge with LSF’s • ‘squeue’ versus ‘bjobs’ • ‘scontrol show job’ vs. ‘bhist –l’ HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

  38. HP XC Cluster = A simple, scalable, and flexible Linux cluster solution

More Related