330 likes | 441 Views
What we do: High performance computing, university owned and operated center DoD funded through HPCMP Provide HPC resources and support Conduct research locally, globally.
E N D
What we do: • High performance computing, university owned and operated center • DoD funded through HPCMP • Provide HPC resources and support • Conduct research locally, globally Who we are:Committed to helping scientists seek understanding of our past, present and future by applying computational technology to advance discovery, analysis and prediction
University Research Center • ARSC computational, visualization and network resources enable broad opportunities for students and faculty • HPC training, outreach activities, internships and science application workshops • HPC facility providing open accessibility for research and education • Developing, evaluating and using acceleration technologies to drive leading-edge HPC computing • • Employs undergraduates, graduate students , post-doctoral fellows, faculty and staff
New and Current HPC Users in Training • Introduce new and current users to HPC and visualization resources • Provides programming skills for successful use of ARSC resources • In-depth instruction, hands-on assistance developing codes • Collaborative discussions with other users and parallel computing experts • Early adoption and assessment of software tools
Outline • The computers • How we access them • Storage environment and policies • Environment on midnight • Submitting Jobs • Hurricane Katrina Example
Arctic-powered HPC • Pingo • Cray XT5 dedicated March 2009 • 3,456 processors • 31.8 teraflops of peak computing power (30 trillion arithmetic calculations a second) • 13.5 terabytes of memory • Midnight • Sun Opteron cluster • 12.02 teraflops of peak computing power (12 trillion arithmetic calculations a second) • 9.2 terabytes of memory
Accessing midnight Password/SecurID Kerberos Server Authenticate Kerberos Ticket ARSC ssh –X –Y midnight.arsc.edu Kerberos Ticket midnight Obtain a Kerberos ticket, which gets stored on your local machine Use Kerberized software to connect – this software will send the valid Kerberos ticket and allow you to connect
1. ARSC storage • ARSC provides storage in three primary locations. Environment variables are defined for each location. • $HOME • $WRKDIR or $WORKDIR (and/or $SCRATCH) • $ARCHIVE or $ARCHIVE_HOME • Available via special request: • $DATADIR
$HOME • Purpose: location to store configuration files and commonly used executables. • Quota: 100MB by default on IBMs. 512 MB on midnight. • Backed Up: yes • Purged: no • Notes: Available from computational nodes and login nodes.
$WRKDIR • Purpose: place to run jobs and store temporary files. • Quota: quota varies by system. 100 GB on midnight. • Backed Up: no • Purged: yes • Notes: Available from computational nodes and login nodes. Can be increased on most systems if you need more space.
$ARCHIVE • Purpose: place to store files long term. • Quota: no quota • Backed Up: yes • Purged: no • Notes: May not be available from all computational nodes. Available from login nodes. Files can be offline. Network Filesystem (NFS) hosted by two Sun File 6800 systems: nanook and seawolf.
Midnight • Manufacturer: Sun Microsystems • Operating System: Linux (SLES 9) • Interconnect: Voltaire Infiniband • Processors: 2.6 GHz AMD Opteron (dual core) 2312- total compute cores
Midnight Hardware • Sun Fire X4600- login and compute nodes • Sun Fire X2200- compute nodes • Sun Fire X4500- temporary filesystem
X4600 Login Nodes • 2- X4600 Login Nodes • called midnight.arsc.edu (or midnight1.arsc.edu) and midnight2.arsc.edu. • Each node has: • 4- AMD Opteron 2.6 GHz dual core processors • 32- GB of shared memory • 1-4X Infiniband network card • QFS access to long term storage (i.e. $ARCHIVE) on seawolf. • Linux Operating System (SLES 9)
X4600 Compute Nodes • 55- X4600 Nodes • Each node has: • 8- AMD Opteron 2.6 GHz dual core processors • 64- GB of shared memory • 1-4X Infiniband network card • Linux Operating System (SLES 9)
X2200 Compute Nodes • 358- X2200 Nodes • Each node has: • 2- AMD Opteron 2.6 GHz dual core processors • 16- GB of shared memory • 1-4X Infiniband network card • Linux Operating System (SLES 9)
Modules Environment • Midnight has the modules package installed (not to be confused with Fortran 90 modules). • This package allows you to quickly switch between different versions of a software package (e.g. compilers).
Modules Environment • ARSC also uses modules for packages that require one or more environment variables to be set to function properly. This hopefully makes such packages easier to use.
Modules • When you log on to midnight the PrgEnv module is loaded by default. This module loaded the Pathscale compilers into the PATH along with the corresponding MPI libraries. • The “module load” is done in either the .profile or the .login.
Sample Module Use mg56 % which pathcc /usr/local/pkg/pathscale/pathscale-2.5/bin/pathcc mg56 % module list Currently Loaded Modulefiles: 1) voltairempi-S-1.pathcc 3) PrgEnv 2) pathscale-2.5 mg56 % module switch PrgEnv PrgEnv.path-3.0 mg56 % which pathcc /usr/local/pkg/pathscale/pathscale-3.0/bin/pathcc mg56 % module list Currently Loaded Modulefiles: 1) PrgEnv.path-3.0 3) voltairempi-S-1.pathcc 2) pathscale-3.0
Module usage for WRF – my recommendation module purge module load PrgEnv.pgi module load ncl
More information on modules • Midnight how to page http://www.arsc.edu/support/howtos/usingsun.html#modules • HPC Users’ Newsletter http://www.arsc.edu/support/news/HPCnews/HPCnews342.shtml • Modules Documentation http://modules.sourceforge.net/ http://modules.sourceforge.net/man/module1.html
Queuing System • Jobs on midnight use the PBS queuing system for queuing and scheduling. • PBS allow you to submit jobs, remove jobs, etc.
Common PBS commands • qsub job.pbs- submit the script “job.pbs” to run by PBS. • qstat- list jobs which haven’t yet completed • qdel jobid- delete a job from the queue. • qmap- show a graphical list of the current work on nodes.
qsub • tells the queuing system: • how many processors your job will need • what kind of nodes • what queue to use • how much walltime the job will need • what to do with stdout and stderr • and more...
Common Queues • debug- for quick turn around debugging work. • standard- regular work. This queue requires that you have an allocation of CPU time. • background- lowest priority queue, but doesn’t require that you have an allocation. • data- queue which allows data to be transferred to long term storage. (i.e. $ARCHIVE_HOME)
PBS script- MPI job using X2200 (4way) nodes #!/bin/bash #PBS -q standard #PBS -l select=8:ncpus=4:node_type=4way #PBS -l walltime=8:00:00 #PBS -j oe cd $PBS_O_WORKDIR mpirun -np 32 ./myprog # This script request 8 chunks with 4 cpus # each on 4way nodes (a.k.a X2200).
PBS script- MPI job using X4600 (16way) nodes #!/bin/bash #PBS -q standard #PBS -l select=2:ncpus=16:node_type=16way #PBS -l walltime=8:00:00 #PBS -j oe cd $PBS_O_WORKDIR mpirun -np 32 ./myprog # This script request 2 chunks with 16 cpus # each on 16way nodes (a.k.a X4600).
Additional PBS Resources • Midnight How To Guide: http://www.arsc.edu/support/howtos/usingsun.html#batch • ARSC HPC Users’ Newsletter- Job Chaining Articles: http://www.arsc.edu/support/news/HPCnews/HPCnews322.shtml#article2 http://www.arsc.edu/support/news/HPCnews/HPCnews320.shtml#article3 http://www.arsc.edu/support/news/HPCnews/HPCnews319.shtml#article1
Additional Resources • Midnight How-to Page: http://www.arsc.edu/support/howtos/usingsun.html • ARSC HPC Users’ Newsletter http://www.arsc.edu/support/news/ • Pathscale Documentation http://pathscale.com/docs.html • ARSC Help Desk Phone: (907) 450-8602 Email:consult@arsc.edu Web:http://www.arsc.edu/support • Some Exercises based on this talk: http://people.arsc.edu/~bahls/classes/midnight_intro.tar.gz