1 / 23

Job Submission

Job Submission. Andrew Pangborn & Myles Maxfield. The Grid. <Insert some structural picture of grid>?. The Problem. At one end are computing resources managed by batch queuing systems and other middleware At the other end are end-users and their jobs/applications

Download Presentation

Job Submission

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Job Submission Andrew Pangborn & Myles Maxfield Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  2. The Grid • <Insert some structural picture of grid>? Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  3. The Problem • At one end are computing resources managed by batch queuing systems and other middleware • At the other end are end-users and their jobs/applications • Need software and protocols for submitting jobs to the computing resources Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  4. Job Submission • More motivation stuff? Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  5. Batch Queuing Systems • Submitting a job directly to the batch queuing system • One or more queues • Priorities • Two common architectures • Client/server • Dynamic offloading • User credential (delegation) • Jobs have states (e.g. Pending, Running) Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  6. Batch Queuing Systems • Important examples: • Portable Batch System • TORQUE • Xgrid • Sun Grid Engine • Load Sharing Facility • Condor Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  7. Portable Batch System (PBS) • Originally developed for NASA • Client/server architecture • Server: pbs_server • Client: pbs_mom • Works with MPI with built-in shell script variables Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  8. PBS Example litherum@gras:~$ cat test.sh #!/bin/sh #testpbs echo This is a test echo today is `date` echo This is `hostname` echo The current working directory is `pwd` ls -alF /home uptime Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  9. PBS Example litherum@gras:~$ qsub test.sh 6.gras.carrion.rit.edu litherum@gras:~$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 6.gras test.sh litherum 00:00:00 C batch litherum@gras:~$ cat test.sh.o6 This is a test today is Sat Jan 17 18:20:20 EST 2009 This is carrion02 The current working directory is /home/litherum total 20 drwxr-xr-x 31 litherum litherum 4096 Jan 17 18:19 litherum/ 18:20:20 up 131 days, 21:20, 0 users, load average: 0.00, 0.00, 0.00 Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  10. Torque • Built on top of PBS • Supports reservations, where you can reserve specific resources for specific times. • Supports partitions, where you can partition a cluster into smaller sub-clusters. Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  11. Torque litherum@gras:~$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 0 Active Jobs 0 of 4 Processors Active (0.00%) 0 of 2 Nodes Active (0.00%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 0 Idle Jobs BLOCKED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 0 Active Jobs: 0 Idle Jobs: 0 Blocked Jobs: 0 Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  12. Xgrid • Apple • Essentially the same as Condor • GUI! =) • Client/server model http://upload.wikimedia.org/wikipedia/en/6/62/XgridAdminTool.jpg Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  13. Sun Grid Engine • Open source, like everything new Sun puts out • Supports • Reservations • Job dependencies, • Checkpointing • Multiple scheduling algorithms • Web interface • Professional! Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  14. Load Sharing Facility • Used by GRAM, which we’ll talk about later Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  15. Condor • More about this later, but it implements its own scheduler Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  16. Challenging! • These queuing systems are hard to use • There may be many systems employed in a given grid • Wouldn’t it be nice if all this were unified in a single implementation? Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  17. Condor • <Multiple slide on condor> - Andrew Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  18. GRAM • Pluggable! • Can’t make up their mind how to describe jobs • Will submit jobs to: • Condor • LSF • PBS/Torque • ??? • Unified interface, identifier for which cluster/service to use Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  19. GRAM Example maxfield@tg-login1:~> globusrun-ws -submit -factory https://tg-login.ornl.teragrid.org:84 44/wsrf/services/ManagedJobFactoryService -factory-type PBS -streaming -job-command /bin/ hostname Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:89538014-e4f2-11dd-81df-0010180bb4e6 Termination time: 01/18/2009 23:57 GMT Current job state: Pending Current job state: Active tg-c15 Current job state: CleanUp-Hold Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  20. UNICORE • <couple slides on UNICORE> Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  21. Condor-G • Something about condor-G? • Transition into upperware? Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  22. Upperware • Talk about motivation for upperware applications Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

  23. GridShell Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu

More Related