110 likes | 221 Views
Using Clusters. -User Perspective. Pre-cluster scenario. So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid … Different S/W on each of them Different H/W capabilities The desired one may be down
E N D
Using Clusters -User Perspective
Pre-cluster scenario • So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid … • Different S/W on each of them • Different H/W capabilities • The desired one may be down • Only few are in the top bracket, so response may be slow
Cluster • Only one machine for so many computers • Same S/W everywhere • Same H/W • Few systems down is no problem • One can use the m/c as Interactive Server, Batch Sever, Sequential m/c, Parallel m/c
User Interface to Cluster • Like OS is between m/c and user • This interface is between user and a chunk of m/c s • Users Interface m/c s
Components • Q ing: Collection of user jobs/requests in the form of batch jobs • Scheduling: Selecting user jobs to run and m/c s to run on • Monitoring: Usage policy implementation, Job and m/c status track
Portable Batch System (PBS) • Two components: User Commands and System Daemons • User commands eqv. GUI is also available • User commands are for: submit, monitor, modify, delete etc. tasks. • Daemons: Server for managing resources of the whole cluster • Scheduler Selects the executer and its resources
Executer some node and some processor selected by the scheduler • Running a job: • 1- Create a file having OS and PBS commands: ./a.out • #PBS –l ncpus=4 • 2- Submitting a job: Use the command qsub <file_containing_OS/PBS commands> [options]
-I option creates an interative session • -q option selects the Q • Checking the status of a job • Tracejob job_number
9/05/2006 20:19:36 S Job Queued at request of santh@hncn17, owner = • santh@hncn17, job name = SCR_LB70-m5stat, queue = • workq • 9/05/2006 20:19:36 S Job Modified at request of Scheduler@hncn17 • 9/05/2006 20:19:36 S enqueuing into workq, state 1 hop 1 • 9/05/2006 20:19:36 A queue=workq • 9/05/2006 22:39:36 L Considering job to run • 9/05/2006 22:39:36 L Not enough of the right type of nodes available
Modifying a job: qalter –l walltime=20:00 • Deleting a job: qdel 17 • Sending signals: qsig –s signal job_identifier • Job movement between Qs is possible • Parallel jobs are run through the command: mpirun • Check pointing is possible
pbs_server, pbs_mom, pbs_scheduler are the three daemons • Compute node runs only pbs_mom