1 / 33

What we do: High performance computing, university owned and operated center

What we do: High performance computing, university owned and operated center DoD funded through HPCMP Provide HPC resources and support Conduct research locally, globally.

toviel
Download Presentation

What we do: High performance computing, university owned and operated center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What we do: • High performance computing, university owned and operated center • DoD funded through HPCMP • Provide HPC resources and support • Conduct research locally, globally Who we are:Committed to helping scientists seek understanding of our past, present and future by applying computational technology to advance discovery, analysis and prediction

  2. University Research Center • ARSC computational, visualization and network resources enable broad opportunities for students and faculty • HPC training, outreach activities, internships and science application workshops • HPC facility providing open accessibility for research and education • Developing, evaluating and using acceleration technologies to drive leading-edge HPC computing • • Employs undergraduates, graduate students , post-doctoral fellows, faculty and staff

  3. New and Current HPC Users in Training • Introduce new and current users to HPC and visualization resources • Provides programming skills for successful use of ARSC resources • In-depth instruction, hands-on assistance developing codes • Collaborative discussions with other users and parallel computing experts • Early adoption and assessment of software tools

  4. Outline • The computers • How we access them • Storage environment and policies • Environment on midnight • Submitting Jobs • Hurricane Katrina Example

  5. Arctic-powered HPC • Pingo • Cray XT5 dedicated March 2009 • 3,456 processors • 31.8 teraflops of peak computing power (30 trillion arithmetic calculations a second) • 13.5 terabytes of memory • Midnight • Sun Opteron cluster • 12.02 teraflops of peak computing power (12 trillion arithmetic calculations a second) • 9.2 terabytes of memory

  6. Accessing midnight Password/SecurID Kerberos Server Authenticate Kerberos Ticket ARSC ssh –X –Y midnight.arsc.edu Kerberos Ticket midnight Obtain a Kerberos ticket, which gets stored on your local machine Use Kerberized software to connect – this software will send the valid Kerberos ticket and allow you to connect

  7. Accessing midnight

  8. 1. ARSC storage • ARSC provides storage in three primary locations. Environment variables are defined for each location. • $HOME • $WRKDIR or $WORKDIR (and/or $SCRATCH) • $ARCHIVE or $ARCHIVE_HOME • Available via special request: • $DATADIR

  9. $HOME • Purpose: location to store configuration files and commonly used executables. • Quota: 100MB by default on IBMs. 512 MB on midnight. • Backed Up: yes • Purged: no • Notes: Available from computational nodes and login nodes.

  10. $WRKDIR • Purpose: place to run jobs and store temporary files. • Quota: quota varies by system. 100 GB on midnight. • Backed Up: no • Purged: yes • Notes: Available from computational nodes and login nodes. Can be increased on most systems if you need more space.

  11. StorageTek Silo & Sun Fire 6800’s

  12. $ARCHIVE • Purpose: place to store files long term. • Quota: no quota • Backed Up: yes • Purged: no • Notes: May not be available from all computational nodes. Available from login nodes. Files can be offline. Network Filesystem (NFS) hosted by two Sun File 6800 systems: nanook and seawolf.

  13. Midnight • Manufacturer: Sun Microsystems • Operating System: Linux (SLES 9) • Interconnect: Voltaire Infiniband • Processors: 2.6 GHz AMD Opteron (dual core) 2312- total compute cores

  14. Midnight Hardware • Sun Fire X4600- login and compute nodes • Sun Fire X2200- compute nodes • Sun Fire X4500- temporary filesystem

  15. X4600 Login Nodes • 2- X4600 Login Nodes • called midnight.arsc.edu (or midnight1.arsc.edu) and midnight2.arsc.edu. • Each node has: • 4- AMD Opteron 2.6 GHz dual core processors • 32- GB of shared memory • 1-4X Infiniband network card • QFS access to long term storage (i.e. $ARCHIVE) on seawolf. • Linux Operating System (SLES 9)

  16. X4600 Compute Nodes • 55- X4600 Nodes • Each node has: • 8- AMD Opteron 2.6 GHz dual core processors • 64- GB of shared memory • 1-4X Infiniband network card • Linux Operating System (SLES 9)

  17. X2200 Compute Nodes • 358- X2200 Nodes • Each node has: • 2- AMD Opteron 2.6 GHz dual core processors • 16- GB of shared memory • 1-4X Infiniband network card • Linux Operating System (SLES 9)

  18. Modules Environment • Midnight has the modules package installed (not to be confused with Fortran 90 modules). • This package allows you to quickly switch between different versions of a software package (e.g. compilers).

  19. Modules Environment • ARSC also uses modules for packages that require one or more environment variables to be set to function properly. This hopefully makes such packages easier to use.

  20. Modules • When you log on to midnight the PrgEnv module is loaded by default. This module loaded the Pathscale compilers into the PATH along with the corresponding MPI libraries. • The “module load” is done in either the .profile or the .login.

  21. Sample Module Commands

  22. Available Modules

  23. Sample Module Use mg56 % which pathcc /usr/local/pkg/pathscale/pathscale-2.5/bin/pathcc mg56 % module list Currently Loaded Modulefiles: 1) voltairempi-S-1.pathcc 3) PrgEnv 2) pathscale-2.5 mg56 % module switch PrgEnv PrgEnv.path-3.0 mg56 % which pathcc /usr/local/pkg/pathscale/pathscale-3.0/bin/pathcc mg56 % module list Currently Loaded Modulefiles: 1) PrgEnv.path-3.0 3) voltairempi-S-1.pathcc 2) pathscale-3.0

  24. Module usage for WRF – my recommendation module purge module load PrgEnv.pgi module load ncl

  25. More information on modules • Midnight how to page http://www.arsc.edu/support/howtos/usingsun.html#modules • HPC Users’ Newsletter http://www.arsc.edu/support/news/HPCnews/HPCnews342.shtml • Modules Documentation http://modules.sourceforge.net/ http://modules.sourceforge.net/man/module1.html

  26. Queuing System • Jobs on midnight use the PBS queuing system for queuing and scheduling. • PBS allow you to submit jobs, remove jobs, etc.

  27. Common PBS commands • qsub job.pbs- submit the script “job.pbs” to run by PBS. • qstat- list jobs which haven’t yet completed • qdel jobid- delete a job from the queue. • qmap- show a graphical list of the current work on nodes.

  28. qsub • tells the queuing system: • how many processors your job will need • what kind of nodes • what queue to use • how much walltime the job will need • what to do with stdout and stderr • and more...

  29. Common Queues • debug- for quick turn around debugging work. • standard- regular work. This queue requires that you have an allocation of CPU time. • background- lowest priority queue, but doesn’t require that you have an allocation. • data- queue which allows data to be transferred to long term storage. (i.e. $ARCHIVE_HOME)

  30. PBS script- MPI job using X2200 (4way) nodes #!/bin/bash #PBS -q standard #PBS -l select=8:ncpus=4:node_type=4way #PBS -l walltime=8:00:00 #PBS -j oe cd $PBS_O_WORKDIR mpirun -np 32 ./myprog # This script request 8 chunks with 4 cpus # each on 4way nodes (a.k.a X2200).

  31. PBS script- MPI job using X4600 (16way) nodes #!/bin/bash #PBS -q standard #PBS -l select=2:ncpus=16:node_type=16way #PBS -l walltime=8:00:00 #PBS -j oe cd $PBS_O_WORKDIR mpirun -np 32 ./myprog # This script request 2 chunks with 16 cpus # each on 16way nodes (a.k.a X4600).

  32. Additional PBS Resources • Midnight How To Guide: http://www.arsc.edu/support/howtos/usingsun.html#batch • ARSC HPC Users’ Newsletter- Job Chaining Articles: http://www.arsc.edu/support/news/HPCnews/HPCnews322.shtml#article2 http://www.arsc.edu/support/news/HPCnews/HPCnews320.shtml#article3 http://www.arsc.edu/support/news/HPCnews/HPCnews319.shtml#article1

  33. Additional Resources • Midnight How-to Page: http://www.arsc.edu/support/howtos/usingsun.html • ARSC HPC Users’ Newsletter http://www.arsc.edu/support/news/ • Pathscale Documentation http://pathscale.com/docs.html • ARSC Help Desk Phone: (907) 450-8602 Email:consult@arsc.edu Web:http://www.arsc.edu/support • Some Exercises based on this talk: http://people.arsc.edu/~bahls/classes/midnight_intro.tar.gz

More Related