230 likes | 390 Views
ISTeC CSU Cray High-Performance Computer . Richard Casey, PhD RMRCE CSU Center for Bioinformatics. Accounts . To Get an ISTeC Cray Account Get ISTeC Cray account request form at: http ://istec.colostate.edu/istec_cray / Or send email request to richard.casey@colostate.edu
E N D
ISTeC CSU Cray High-Performance Computer Richard Casey, PhD RMRCE CSU Center for Bioinformatics
Accounts • To Get an ISTeC Cray Account • Get ISTeC Cray account request form at: http://istec.colostate.edu/istec_cray/ • Or send email request to richard.casey@colostate.edu • Submit account request form to Richard Casey • Accounts available for faculty, graduate students, postdocs, classes, others • Accounts typically set up in two business days
Access • To Access the ISTeC Cray • SSH (secure shell); SFTP (secure FTP); SCP (secure copy) • PuTTY, FileZilla, others • Windows • Secure Shell Client, Secure File Transfer Client • Mac, Linux • Terminal window sessions • Remotely • VPN (Virtual Private Network) (http://www.acns.colostate.edu/Connect/VPN-Download) • Check ACNS website for client software (http://www.acns.colostate.edu/)
Access • Cray DNS name: cray2.colostate.edu • Cray IP address: 129.82.103.183 • SSH login: • ssh –l accountname cray2.colostate.edu • SFTP file transfers: • sftp accountname@cray2.colostate.edu • PuTTy is available at: • http://www.chiark.greenend.org.uk/~sgtatham/putty/ • FileZilla is available at: • http://filezilla-project.org/
Misc Items • ISTeC Cray Website • istec.colostate.edu/istec_cray • Cray Documentation Website • docs.cray.com • Choose “Platforms->Cray XT” • Cray User’s Guide v.2.0 • Change password • Issue “passwd2” command • Backups • /home directory -> nightly incremental backups • “lustrefs” directory -> NOTbacked up; be sure to copy key files to /home directory or sftp files off the Cray • /home directory is very small • Do not store large files here • Store large files in “lustrefs” directory
Cray System Architecture Front Batch compute blades (batch compute nodes) SeaStar 2+ Interconnect Interactive compute blades (interactive compute nodes) Login node; Lustre file system node Back
XT6mCompute Node Architecture 6MB L3 Cache 6MB L3 Cache HT3 DDR3 Channel DDR3 Channel • Each compute node contains 2 processors (2 sockets) • 64-bit AMD Opteron “Magny-Cours” 1.9Ghz processors • 1 NUMA processor = 6 cores • 4 NUMA processors per compute node • 24 cores per compute node • 4 NUMA processors per compute blade • 32 GB RAM (shared) / compute node = 1.664 TB total RAM (ECC DDR3 SDRAM) • 1.33 GB RAM / core DDR3 Channel HT3 HT3 DDR3 Channel HT3 HT3 6MB L3 Cache 6MB L3 Cache DDR3 Channel DDR3 Channel HT3 DDR3 Channel DDR3 Channel HT To Interconnect Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound
Modules • Modules Environment Management Package • Automatically configures shell environment via modulefiles • Each modulefile contains all information needed to configure shell for a particular application • Modules set PATH, MANPATH, shell environment variables, libraries, etc. so you don’t have to • Use modules to easily manage shell environment and applications
Modules • After logging in enter: • module list rcasey@cray2:~> module list Currently Loaded Modulefiles: 1) modules 9) cce/7.2.8 2) portals/2.2.0-1.0301.24560.5.2.ss 10) acml/4.4.0 3) nodestat/2.2-1.0301.24557.5.1.ss 11) xt-libsci/10.4.9 4) sdb/1.0-1.0301.24568.5.4.ss 12) xt-mpt/5.1.2 5) MySQL/5.0.64-1.0301.2899.20.1.ss 13) pmi/1.0-1.0000 6) lustre-cray_ss_s/1.8.2_2.6.27.48_0.12.1_1.0301.5636.4.1-1.0301.24584.3.6 14) xt-asyncpe/4.5 7) Base-opts/1.0.2-1.0301.24518.5.1.ss 15) PrgEnv-cray/3.1.49 8) xtpe-network-seastar
Modules • To see the contents of a module enter: • module show rcasey@cray2:~> module show cce ------------------------------------------------------------------- /opt/modulefiles/cce/7.2.8: setenv CRAYLMD_LICENSE_FILE /opt/cray/cce/cce.lic setenv CRAY_BINUTILS_ROOT /opt/cray/cce/7.2.8/cray-binutils setenv CRAY_BINUTILS_VERSION /opt/cray/cce/7.2.8 setenv CRAY_BINUTILS_BIN /opt/cray/cce/7.2.8/cray-binutils/x86_64-unknown-linux-gnu/bin setenv LINKER_X86_64 /opt/cray/cce/7.2.8/cray-binutils/x86_64-unknown-linux-gnu/bin/ld setenv ASSEMBLER_X86_64 /opt/cray/cce/7.2.8/cray-binutils/x86_64-unknown-linux-gnu/bin/as setenv GCC_X86_64 /opt/gcc/4.1.2/snos setenv CRAYLIBS_X86_64 /opt/cray/cce/7.2.8/craylibs/x86-64 prepend-path FORTRAN_SYSTEM_MODULE_NAMES ftn_lib_definitions prepend-path MANPATH /opt/cray/cce/7.2.8/man:/opt/cray/cce/7.2.8/craylibs/man:/opt/cray/cce/7.2.8/CC/ prepend-path NLSPATH /opt/cray/cce/7.2.8/CC/x86-64/nls/En/%N.cat:/opt/cray/cce/7.2.8/craylibs/x86-6 prepend-path INCLUDE_PATH_X86_64 /opt/cray/cce/7.2.8/craylibs/x86-64/include prepend-path PATH /opt/cray/cce/7.2.8/cray-binutils/x86_64-unknown-linux-gnu/bin:/opt/cray/cce/7.2. append-path MANPATH /usr/share/man -------------------------------------------------------------------
Modules • To see all available modulefilesenter: • module avail • To load a module enter: • module load modulefile • To unload a module enter: • module unload modulefile • To swap one module for another enter: • module swap modulefile1 modeulefile2 • module swap PrgEnv-cray PrgEnv-gnu
Cray Operating System • Login node • Cray Linux Environment (CLE) on login node • Based on SUSE Linux Enterprise Server (SLES) v.11 • Compute nodes • Compute Node Linux (CNL) on compute nodes • Lightweight microkernel • Maximize performance; maximize stability; minimize OS overhead; minimize OS jitter • Somewhat different from cluster environment • Read-only shared root file system • Shared root file system mounted from boot node and SeaStar interconnect • All compute nodes have same directory structure
Compilers • Cray (Cray Compiler Environment): C, C++, Fortran • GNU: gcc, g++, gfortran • PGI (Portland Group): C, C++, Fortran • PathScale: C, C++, Fortran • Python • /usr/bin/python (serial version) • No module driver yet • We’re checking into parallel Python • Use PrgEnv modules to select compiler • “module load PrgEnv-cray” • “module load PrgEnv-gnu” • “module load PrgEnv-pgi” • “module load PrgEnv-pathscale” • Drivers “cc”, “CC” and “ftn” are used for all compilers • Drivers automatically include appropriate libraries for the selected compiler • i.e. –lmpich, -lsci, -lacml, -lpapi, etc.
A Note on Compilers • There are a limited number of licenses for compiling, e.g. simultaneous users • 5 licenses for Cray environment • 2 licenses for PathScale environment • 2 licenses for Portland Group environment • Keep trying if you cannot get a license, or use ‘unrestricted’ open source compilers, i.e. gnu compiler
Compute Node Status • Check state of interactive and batch compute nodes and whether they are already allocated to other user’s jobs: • xtnodestat Current Allocation Status at Mon Feb 21 12:36:45 2011 C0-0 n3 -----dfj n2 -----c-i n1 -----b-h c1n0 -----aeg n3 SSS;;;-- n2 ;;;-- n1 ;;;-- c0n0 SSS;;;-- s01234567 Legend: nonexistent node S service node ; free interactive compute node - free batch compute node A allocated, but idle compute node ? suspect compute node X down compute node Y down or admindown service node Z admindown compute node Available compute nodes: 12 interactive, 30 batch • Currently • 960 batch compute cores • 288 interactive compute cores Cabinet ID Batch Compute Nodes Allocated Nodes Free Nodes Service Nodes Interactive Compute Nodes Cage X: Node X Slots
ALPS (Application Level Placement Scheduler) • ALPS • Launch interactive jobs • Specifies application resource requirements • Specifies application placement on compute nodes • Initiates application launch • Must be used within the “lustrefs” filesystem • aprun • launch parallel jobs • apstat -X • show running jobs • apkill APID • delete running jobs • APID is shown in apstat
ALPS: Interactive Jobs • Key parameters • -n: specifies number of MPI tasks allocated for job • -N: specifies number of MPI tasks placed per node • -d: specifies number of OpenMP threads per node; must be <= 24 MPI code • aprun –n6 exe run job with 6 MPI tasks, all tasks on one node • aprun –n6 –N1 exe run job with 6 MPI tasks, one task per node OpenMP code • export OMP_NUM_THREADS=24 • aprun –d24 exe run job with 24 OpenMP threads, all threads on one node Hybrid MPI-OpenMP code • export OMP_NUM_THREADS=24 • aprun –n6 –N1 –d24 exe run job with 6 MPI tasks, one task per node, with 24 OpenMP threads per node
ALPS: Batch Jobs • Torque/Moab/PBS Batch Queue Management System • For submission and management of jobs in the batch queues • Use for jobs with large resource requirements (long-running, # of cores, memory, etc.) • List all available queues (brief): • qstat–Q rcasey@cray2:~> qstat -Q Queue Max Tot EnaStrQue Run HldWatTrn Ext T ---------------- --- --- --- --- --- --- --- --- --- --- - batch 0 0 yes yes 0 0 0 0 0 0 E • Show the status of jobs in all queues: • qstat • (Note: if there are no jobs running in any of the batch queues, this command will show nothing and just return the Linux prompt). rcasey@cray2:~/lustrefs/mpi_c> qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 1753.sdb mpic.jobrcasey 0 R batch
ALPS: Batch Jobs • Submit a job to the default batch queue: • qsub filename • “filename” is the name of a file that contains batch queue commands • Delete a job from the batch queues: • qdeljobid • “jobid” is the job ID number as displayed by the “qstat” command. You must be the owner of the job in order to delete it.
Sample Batch Job Script #!/bin/bash #PBS –N jobname #PBS –j oe #PBS –l mppwidth=6 #PBS –l mppdepth=24 #PBS –l walltime=1:00:00 cd $PBS_O_WORKDIR export OMP_NUM_THREADS=24 date aprun –n6 –d24 executable • Batch script directives: • -N: name of the job • -j oe: combine standard output and standard error in single file • -l mppwidth: specifies number of MPI tasks allocated for job • -l mppdepth: specifies number of OpenMP threads allocated for job • -l walltime: specifies maximum amount of wall clock time for job to run (hh:mm:ss)
Sample Batch Job Script • PBS_O_WORKDIR environment variable generated by Torque/PBS. Contains the absolute path to directory from which you submitted your job. Requiredfor Torque/PBS to find your executable files. • Linux commands can be included in batch job script • Value set in aprun “-n” parameter should match value set in “mppwidth” directive • Value set in aprun “-d” parameter should match value set in “mppdepth” directive
Contact Info Richard Casey, PhD ISTeC Cray System Administrator Phone: 970-492-4127 Cell: 970-980-5975 Email: richard.casey@colostate.edu