1 / 21

Running jobs on SDSC Resources

Running jobs on SDSC Resources. Krishna Muriki Oct 31 , 2006 kmuriki@sdsc.edu SDSC User Services. Agenda !!!. Using DataStar Using IA64 cluster Using HPSS resource. DataStar Overview. P655 :: ( 8-way, 16GB) 176 nodes P655+ :: ( 8-way, 32GB) 96 nodes

aleron
Download Presentation

Running jobs on SDSC Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running jobs on SDSC Resources Krishna Muriki Oct 31 , 2006 kmuriki@sdsc.edu SDSC User Services

  2. Agenda !!! • Using DataStar • Using IA64 cluster • Using HPSS resource.

  3. DataStar Overview • P655 :: ( 8-way, 16GB) 176 nodes • P655+ :: ( 8-way, 32GB) 96 nodes • P690 :: ( 32-way, 64GB) 2 nodes • P690 :: ( 32-way, 128GB) 4 nodes • P690 :: ( 32-way, 256GB) 2 nodes Total – 280 nodes :::: 2,432 processors.

  4. Batch/Interactive computing • Batch Job Queues: • Job queue Manager – Load Leveler (tool from IBM) • Job queue Scheduler – Catalina (SDSC internal tool) • Job queue Monitoring – Various tools (commands) • Jobs Accounting – Job filter (SDSC internal PERL scripts)

  5. DataStar Access • Three Login Nodes :: Access modes (platforms) (usage mode) • dslogin.sdsc.edu :: Production runs (P690, 32-way, 64GB) • dspoe.sdsc.edu :: Test/debug runs (P655, 8-way, 16GB) • dsdirect.sdsc.edu :: Special needs (P690, 32-way, 256GB) Note : Above Usage modes division is not very strict.

  6. Test/debug runs (Usage from dspoe) [dspoe.sdsc.edu :: P655, 8-way, 16GB] • Access to two queues: • P655 nodes [shared] • P655 nodes [Not – shared] • Job queues have Job filter + Load Leveler only (very fast) • Special command line submission (along with job script).

  7. Production runs (Usage from dslogin) [dslogin.sdsc.edu :: P690, 32-way, 64GB] • Data transfer/ Src editing/Compliation etc… • Two queues: • Onto p655/p655+ nodes [not shared] • Onto p690 nodes [shared] • Job ques have Job filter + LoadLeveler + Catalina (Slowupdates)

  8. All Special needs (Usage from dsdirect) [dsdirect.sdsc.edu :: P690, 32-way, 256GB] • All Visualization needs • All post data analysis needs • Shared node (with 256 GB of memory) • Process accounting in place • Total (a.out) interactive usage. • No Job filter, No Load Leveler, No Catalina

  9. Suggested usage model • Start with dspoe (test/debug queues) • Do production runs from dslogin (normal & normal32 queues) • Use express queues from dspoe to get it right now. • Use dsdirect for special needs.

  10. Accounting • reslist –u user_name • reslist –a account_name

  11. Now lets do it ! • Example files are located here: • /gpfs/projects/workshop/running_jobs • Copy the whole directory (tcsh) • Use Makefile to compile the source code. • Edit the parameters in the job submission scripts. • Communicate with job manager using his language.

  12. Job Manager language • Ask him to show the queue: llq • Ask him to submit your job to queue: llsubmit • Ask him to cancel your job in the queue: llcancel • Special (more useful commands from SDSC’s inhouse tool – Catalina – plz bare with me – I’m slow  ) • ‘showq’ to look at the status of the queue. • ‘show_bf’ to look at the backfill window opportunities

  13. Access to HPSS - 1 • What is HPSS: The centralized, long-term data storage system at SDSC is the High Performance Storage System (HPSS) • currently stores more than 3 PB of data (as of June 2006) • total system capacity of 7.2 PB of data. • Data added at an average rate of 100 TB per month (between Aug’0 5 and Feb’ 06).

  14. Access to HPSS - 2 • First thing – setup your authentication: • run ‘get_hpss_keytab’ script. • Know HPSS language to talk to it: • hsi • htar

  15. SDSC IA64 cluster

  16. IA64 cluster overview • Around 265 nodes. • 2-way nodes • 4GB memory per node. • Batch job environment • Job Manager – PBS (Open source tool) • Job Scheduler – Catalina (SDSC internal tool) • Job Monitoring – Various commands & ‘Clumon’

  17. IA64 Access • IA64 Login Nodes • tg-login1.sdsc.edu ( alias to tg-login.sdsc.edu ) • tg-login2.sdsc.edu • tg-c127.sdsc.edu,tg-c128.sdsc.edu, • tg-c129.sdsc.edu & tg-c130.sdsc.edu.

  18. Queues & Nodes. • Total around 260 nodes • With 2 processors each. • All in single batch queue – ‘dque’ • That’s sufficient now lets do it! • Example files in • /gpfs/projects/workshop/running_jobs • PBS commands – qstat, qsub, qdel

  19. Running Interactive Interactive use is via PBS: qsub -I -V -l walltime=00:30:00 -l nodes=4:ppn=2 • This request is for 4 nodes for interactive use (using 2 cpus/node) for a maximum wall-clock time of 30 minutes. Once the scheduler can honor the request, PBS responds with: “ready” and gives the node names. • Once nodes are assigned, user can now run any interactive command. For example, to run an MPI program, parallel-test on the 4 nodes, 8 cpus: mpirun -np 8 -machinefile $PBS_NODEFILE parallel-test

  20. References • See all web links at • http://www.sdsc.edu/user_services • Reach us at consult@sdsc.edu

More Related