430 likes | 565 Views
Introduction to the NERSC HPCF NERSC User Services. Hardware, Software, & Usage Mass Storage Access & Connectivity. Hardware, part 1. Cray Parallel Vector Processor (PVP) Systems 96 CPUs, Shared-memory parallelism (Cray tasking, OpenMP);
E N D
Introduction to the NERSC HPCFNERSC User Services Hardware, Software, & Usage Mass Storage Access & Connectivity
Hardware, part 1 • Cray Parallel Vector Processor (PVP) Systems • 96 CPUs, Shared-memory parallelism (Cray tasking, OpenMP); • J90SE clock is 100 MHz; peak performance is 200 Mflops/cpu (~125, actual) • SV1 clock is 300 MHz; peak performance is 1200 Mflops/cpu (~300, actual) • J90Se and SV1 are not binary compatible • Cray T3E MPP System • mcurie • 692 PEs: 644 application, 33 command, 15 OS; 256 MB/PE • PE clock is 450 MHz; peak performance is 900 Mflops/PE (~100, actual) Introduction to NERSC - User Services Group
Hardware, part 2 • IBM SP MPP System • gseaborg, Phase 1 • 304 nodes (608 CPUs): 256 (512) compute, 8 (16) login, 16 (32) GPFS, 8 (16) network, 16 (32) service); 1 GB/node • Node clock is 200 MHz; peak performance is 800Mflops per CPU (~200, actual) • Phase 2 will be bigger and faster • Visualization Server • escher; SGI Onyx 2 • 8 CPUs, 5 GB RAM, 2 graphic pipes • CPU clock is 195 MHz; 2 simultaneous video streams • Math Server • newton; Sun UltraSPARC-II • 1 CPU, 512 MB RAM • CPU clock is 248 MHz Introduction to NERSC - User Services Group
Hardware, part 3 • Parallel Distributed Systems Facility (PDSF) • High Energy Physics facility for detector simulation and data analysis • Multiple clustered systems; Intel Linux PCs, Sun Solaris workstations • Energy Sciences Network (ESNet) • Major component of the Internet; ATM Backbone • Specializing in information retrieval, infrastructure, and group collaboration • High Performance Storage System (HPSS) • Multiple libraries, hierarchical disk and tape archive systems • High speed transfers to NERSC systems • Accessible from outside NERSC • Multiple user interface utilities • Directories for individual users and project groups Introduction to NERSC - User Services Group
PVP: File Systems, part 1 • $HOME • “permanent” (but not archival) • 5 GB quota, regular backups, file migration • local to killeen, NFS-mounted on seymour and batch systems • poor performance for batch jobs • /u/repo/u10101 • /Un/u10101 • /u/ccc/u10101 • /U0/u10101 Introduction to NERSC - User Services Group
PVP: File Systems, part 2 • $TMPDIR • temporary (created/destroyed each session) • no quota (but NQS limits 10 GB - 40 GB) • no backups, no migration • local to each machine • high-performance RAID arrays • system manages this for you • A.K.A. $BIG • /tmp • location of $TMPDIR • 14-day lifetime • A.K.A. /big • you manage this for yourself Introduction to NERSC - User Services Group
PVP: Environment, part 1 • Unicos • Shells • Supported • sh • csh • ksh (same as sh) • Unsupported • tcsh (get it by “module load tcsh”) • bash (get it by “ module load tools”) Introduction to NERSC - User Services Group
PVP: Environment, part 2 • Modules • Found on many Unix systems • Sets all or any of environment variables, aliases, executable search paths, man search paths, header file include paths, library load paths • Exercise care modifying startup files! • Cray’s PrgEnv is modules-driven • Provided startup files are critical! • Add to .ext files, don’t clobber originals • Append to paths, don’t set them, and this only if necessary • If you mess up, no compilers, etc. • Useful commands • module list • module avail • module load modfile • module display modfile • module help modfile Introduction to NERSC - User Services Group
PVP: Environment, part 3 • Programming • Fortran 90 - f90 • C/C++ - cc, CC • Assembler - as • Use compiler (f90, cc, CC) for linking also • f90file naming conventions • filename.f - fixed form Fortran-77 code • filename.F - fixed form Fortran-77 code, run preprocessor first • filename.f90 - free form Fortran 90 code • filename.F90 - free form Fortran 90 code, run preprocessor first • Multiprocessing (aka multitasking, multithreading…) • setenv NCPUS 4 (csh) • export NCPUS=4 (ksh) • "a.out: Command not found.” • ./a.out … (Note: No parallelism specified with execution) Introduction to NERSC - User Services Group
PVP: Environment, part 4a • Execution modes • Interactive serial • 10 hours on killeen and seymour • 80 MW max memory • Interactive parallel • No guarantee of real-time concurrency • Batch queues * = killeen, seymour, franklin, bhaskara ** = franklin, bhaskara • To see them: qstat -b • Queues shuffled at night, and sometimes during the day • Subject to change Introduction to NERSC - User Services Group
PVP: Environment, part 4b • Batch • User creates shell script (e.g., “myscript”) • Submits to NQE with “cqsub myscript” • Returns NQE task id (e.g., “t1234”) • NQE selects machine and forwards to NQS • Job remains pending (“NPend”) until resources available • NQS runs the job • Assigns NQS request id (e.g., “5678.bhaskara”) • Run job in appropriate batch queue • Job log returned upon completion Introduction to NERSC - User Services Group
PVP: Environment, part 5 • Libraries • Mathematics • nag, imsl, slatec, lsode, harwell, etc. • Graphics • ncar, gnuplot,etc. • I/O • HDF, netCDF, etc. • Applications • Amber, Ansys, Basis, Gamess, Gaussian, Nastran, etc. Introduction to NERSC - User Services Group
PVP: Environment, part 6 • Tools • ja - job accounting • hpm - Hardware Performance Monitor • prof - Execution time profiler & viewer • flowtrace/flowview - Execution time profiler & viewer • atexpert - Autotasking performance predictor • f90 - Compiler feedback • totalview - Debugger (visual and line-oriented) Introduction to NERSC - User Services Group
T3E: File Systems, part 1 • $HOME • “permanent” (but not archival) • 2 GB quota, regular backups, file migration • poor performance for batch jobs • /u/repo/u10101 • /Un/u10101 • /u/ccc/u10101 • /U0/u10101 Introduction to NERSC - User Services Group
T3E: File Systems, part 2 • $TMPDIR • temporary (created/destroyed each session) • 75 GB quota (but NQS limits 4 GB - 32 GB) • no backups, no migration • high-performance RAID arrays • system manages this for you • Can be used for parallel files • /tmp • location of $TMPDIR • 14-day lifetime • A.K.A. /big • you manage this for yourself Introduction to NERSC - User Services Group
T3E: Environment, part 1 • UNICOS/mk • Shells: sh/ksh, csh, tcsh • Supported: • Sh • Csh • ksh (same assh) • Unsupported: • tcsh (get it by “module load tcsh”) • Bash (get it by “module load tools”) Introduction to NERSC - User Services Group
T3E: Environment, part 2 • Modules - manages user environment • Paths, Environment variables, Aliases, same as on PVP systems • Cray’s PrgEnv is modules-driven • Provided startup files are critical! • Add to .ext files, don’t clobber originals • Append to paths, don’t set them, and this only if necessary • If you mess up, no compilers, etc. • Useful commands • module list • module avail • module load modfile • module display modfile • module help modfile Introduction to NERSC - User Services Group
T3E: Environment, part 3a • Programming • Fortran 90: f90 • C/C++: cc, CC • Assembler: camcam • Use compiler (f90, cc, CC) for linking also • Same naming conventions as on PVP systems • PGHPF - Portland group HPF • KCC: Kuck and Assoc. C++; • Get it via “module load KCC” • Multiprocessing • Execution in Single-Program, Multiple-Data (SPMD) Mode • In Fortran 90, C, C++, all processors execute same program Introduction to NERSC - User Services Group
T3E: Environment, part 3b • Executables - Malleable or Fixed • specified in compilation and/or execution • f90 -Xnpes ... (e.g., -X64) creates “fixed” executable • Always runs on same number of (application) processors • Type ./a.out to run • f90-Xm... or without -X option creates “malleable” executable • ./a.out will run on command PE • mpprun -n npes ./a.out runs on npes APP PEs • Executing code can ask for: • Process id (from zero up) • MPI_COMM_RANK(...) • Total number of PEs • MPI_COMM_SIZE(...) • PE or Process/Task ID used to establish “master/slave” identities, controlling execution Introduction to NERSC - User Services Group
T3E: Environment, part 4a • Execution modes • Interactive serial • < 60 minutes on one command PE, 20 MW max memory • Interactive parallel • < 30 minutes on < 64 processors, 29 MW memory per PE • Batch queues • To see them: qstat -b • Queues shuffled in at night • Subject to change Introduction to NERSC - User Services Group
T3E: Environment, part 4b • (Old, obsolete) Example of T3E management and queue scheduling Introduction to NERSC - User Services Group
T3E: Environment, part 5 • Math & graphics libraries, and application codes are similar to those on the PVP systems • Libraries are needed for communication: • MPI (Message-Passing Interface) • PVM (Parallel Virtual Machine) • SHMEM (SHared MEMory; non-portable) • BLACS (Basic Linear Algebra Communication Subprograms) • ScaLAPACK (SCAlable [parts of] LAPACK) • LIBSCI (including parallel FFTs), NAG, IMSL • I/O libraries • Cray’s FFIO • NetCDF (NETwork Common Data Format) • HDF (Hierarchical Data Format) Introduction to NERSC - User Services Group
T3E: Environment, part 6 • Tools • Apprentice - finds performance problems and inefficiencies • PAT - Performance analysis tool • TAU - ACTS tuning and analysis utility • Vampir - commercial trace generation and viewing utility • Totalview - multiprocessing-aware debugger • F90 - compiler feedback Introduction to NERSC - User Services Group
SP: File Systems, part 1 • AIX is a Virtual Memory operating system • Each node has its own disks, with OS image, swap and paging spaces, and scratch partitions . • Two types of user-accessible file systems: • Large, globally accessible parallel file system, called GPFS • Smaller node-local partitions Introduction to NERSC - User Services Group
SP: File Systems, part 2 • Environment variables identify directories • $HOME - your personal home directory • Located in GPFS, so globally available to all jobs • Home directories are not currently backed up! • Quotas: 4 GB, and 5000 inodes • $SCRATCH - one of your temporary spaces • Located in GPFS • Very large - 3.5 TB • Transient - purged after session or job termination • $TMPDIR - another of your temporary spaces • Local to a node • Small - only 1 GB • Not particularly fast • Transient - purged on termination of creating session or batch job Introduction to NERSC - User Services Group
SP: File Systems, part 3 • Directly-specified directory paths can also be used • /scratch - temporary space • Located in GPFS • Very large • Not purged at job termination • Subject to immediate purge • Quotas: 100 GB and 6000 inodes • Your $SCRATCH directory is set up in /scratch/tmpdirs/{nodename}/tmpdir.{number} where {number} is system-generated • /scratch/{username} - user-created temporary space • Located in GPFS • Large, fast, encouraged usage • Not purged at job termination • Subject to purge after 7 days, or as needed • Quotas: 100 GB and 6000 inodes Introduction to NERSC - User Services Group
SP: File Systems, part 4 • /scr - temporary space • Local to a node • Small - only 1 GB • Your session-local $TMPDIR is set up in /scr/tmpdir.{number} where {number} is system-generated • Not user-accessible, except for $TMPDIR • /tmp - System-owned temporary space • Local to a node • Very small - 65 MB • Intended for use by utilities, such as vi for temporary files • Dangerous - DO NOT USE! • If filled up, it can cause the node to crash! Introduction to NERSC - User Services Group
SP: Environment, part 1 • IBM's AIX - a true virtual memory kernel • Not a single system image, as on the T3E • Local implementation of module system • No modules load by default • Default shell is csh • Shell startup files (e.g., .login, .cshrc, etc.) are links; DON’T delete them! • Customize extension files (e.g., .cshrc.ext), not startup files Introduction to NERSC - User Services Group
SP: Environment, part 2 • SP Idniosyncracies • All nodes have unique identities; different logins may put you on different nodes • Must change password, shell, etc. on gsadmin node • No incoming FTP allowed • xterms should not originate on the SP • Different sessions may be connected to different nodes • High speed I/O is done differently from the T3E • Processors are faster, but communication is slower, than on the T3E • PFTP is faster than native FTP • SSH access methods differ, slightly Introduction to NERSC - User Services Group
SP: Environment, part 3a • Programming in Fortran • Fortran - Fortran 77, Fortran 90, and Fortran 95 • Multiple "versions" of the XLF compiler • xlf, xlf90 for ordinary serial code • xlf_r, xlf90_r for multithreaded code (shared memory parallelism) • mpxlf90, mpxlf90_r for MPI-based parallel code • Currently, must specify separate temporary directory for Fortran-90 “modules”xlf90 -qmoddir=$TMPDIR -I$TMPDIR modulesource.F source.F • IBM's HPF (xlhpf) is also available Introduction to NERSC - User Services Group
SP: Environment, part 3b • Programming in C and C++ • C & C++ languages supported by IBM • Multiple "versions" of the XLC compiler • cc, xlc for ordinary serial C code • xlC for ordinary serial C++ code • cc_r, xlc_r for multithreaded C code (shared memory parallelism) • xlC_r for multithreaded C++ code (shared memory parallelism) • mpcc for MPI-based parallel C code • mpCC for MPI-based parallel C++ code • Kuck & Assoc. KCC also available in its own module Introduction to NERSC - User Services Group
SP: Environment, part 4a • Execution • Many ways to run codes: • serial, parallel • shared-memory parallel, message-based parallel, hybrid • interactive, batch • Serial execution is easy: ./a.out <input_file >output_file • Parallel execution - SPMD Mode, as with T3E • Uses POE, a supra-OS resource manager • Uses Loadleveler to schedule execution • There is some overlap in options specifiable to POE and LoadLeveler • You can use one or both processors on each node • environment variables and batch options control this Introduction to NERSC - User Services Group
SP: Environment, part 4b • Shared memory parallel execution • Within a node, only • OpenMP, Posix Threads, IBM SMP directives • Message-based parallel execution • Across nodes and within a node • MPI , PVM, LAPI, SHMEM (planned) • Hybrid parallel execution • Threading and message passing • Most likely to succeed: OpenMP and MPI • Currently, MPI understands inter- vs. intra-node communication, and sends intra-node messages efficiently Introduction to NERSC - User Services Group
SP: Environment, part 4c • Interactive execution • Interactive jobs run on login nodes or compute nodes • currently, there are 8 login nodes • Serial execution is easy: ./a.out <input_file >output_file • Parallel exeuction involves POE: poe ./a.out -procs 4 <input_file >output_file • Interactive parallel jobs may be rejected due to resource scarcity; no queueing • By default, parallel interactive jobs use both processors on each node • Batch execution • Batch jobs run on the compute nodes • By default, parallel batch jobs use both processors on each node; • you will be charged for both, even if you override this • Use Loadleveler utilities set to submit, monitor, cancel, etc. • requires a script, specifying resource usage details, execution parameters, etc. • Several job classes, for charging, resource limits: premium, regular, low; • two job types - serial and parallel Introduction to NERSC - User Services Group
SP: Environment, part 4d • SP Batch Queues and resource Limits • Limits: • 3 jobs running • 10 jobs considered for scheduling (idle) • 30 jobs submitted Introduction to NERSC - User Services Group
SP: Environment, part 5 • Libraries and Other Software • Java, Assembler • Aztec, PETSc, ScaLAPACK • Emacs • Gaussian 98, NWChem • GNU Utilities • HDF, netCDF • IMSL, NAG, LAPACK • MASS, ESSL, PESSL • NCAR Graphics • TCL/TK Introduction to NERSC - User Services Group
SP: Environment, part 6 • Tools • VT - vsualization tool for trace visualization and performance monitoring • Xprofiler - graphical code structure and execution time monitoring • Totalview - multiprocessing-aware debugger • Other Debugging Tools • Totalview - available in its own MODULE; • adb - general purpose debugger • dbx - symbolic debugger for C, C++, Pascal, and FORTRAN programs • pdbx - based on dbx, with functionality for parallel programming • TAU - ACTS tuning and analysis utility - planned! • Vampir - commercial trace generation and viewing utility - future! • KAP Suite - future? • PAPI - future? Introduction to NERSC - User Services Group
HPSS Mass Storage • HPSS • Hierarchical, flexible, powerful, performance-oriented • Multiple user interfaces allow easy, flexible storage management • Two distinct physical library systems • May be logically merged in future software release • Accessible from any system from inside or outside NERSC • hpss.nersc.gov, archive.nersc.gov (from outside NERSC) • hpss, archive (from inside NERSC) • Accessible via several utilities • HSI, PFTP, FTP • Can be accessed interactively or from batch jobs • Compatible with system maintenance utilities (“sleepers”) Introduction to NERSC - User Services Group
HPSS Mass Storage • HPSS • Allocated and accounted, just like CPU resources • Storage Resource Units (SRU’s) • Open ended - you get charged, but not cut off, if you exceed your allocation • “Project” spaces available, for easy group collaboration • Used for system backups and user archives • hpss used for both purposes • archive is for user use only • Has modern access control • DCE allows automatic authentication • Special DCE accounts needed • Not uniformly accessible from all NERSC systems • Problems with PFTP on the SP system • Modern secure access methods are problematic • ftp tunneling doesn’t work (yet…) Introduction to NERSC - User Services Group
Accessing NERSC • NERSC recognizes two connection contexts: • Interaction (working on a computer) • File transfer • Use of SSH is required for interaction (telnet, rlogin are prohibited) • SSH is (mostly) standardized and widely available • Most Unix & Linux systems come with it • Commercial (and some freeware) versions available for Windows, Macs, • SSH allows telnet-like terminal sessions, but protects account name and password with encryption • simple and transparent to set up and use • Can look and act like rlogin • SSH can forward xterm connections • sets up a special “DISPLAY” environment variable • encrypts the entire session, in both directions Introduction to NERSC - User Services Group
Accessing NERSC • SSH is encouraged for file transfers • SSH contains “scp”, which acts like “rcp” • scp encrypts login info and all transferred data • SSH also allows secure control connections through “tunneling” or “forwarding” • Here’s how tunneling is done: • Set up a terminal connection to a remote host with port forwarding enabled • This specifies a port on your workstation that ssh will forward to another host • FTP to the forwarded port - looks like you are ftp’ing to your own workstation • Control connection (login process) is forwarded encrypted • Data connections proceed as any ftp transfer would, unencrypted • Ongoing SSH issues being investigated by NERSC staff • Not all firewalls allow ftp tunneling, without “passive” mode • HPSS won’t accept tunneled ftp connections • Workstation platform affects tunneling method • Methods differ slightly on the SP • New options, must use xterm forwarding, no ftp tunneling... • Different platforms accept different ciphers Introduction to NERSC - User Services Group
Information Sources - NERSC Web Pages Introduction to NERSC - User Services Group
Information Sources - On-Line Lecture Materials Introduction to NERSC - User Services Group