470 likes | 621 Views
Getting Started on Emerald. ITS- Research Computing Group. Course Objectives. Word for the Day: Heterogeneous Emerald: the Swiss army knife of computing, something for everyone :) Something you can use today A reference for something you can use tomorrow. Course Objectives Cont.
E N D
Getting Started on Emerald ITS- Research Computing Group
Course Objectives • Word for the Day: Heterogeneous • Emerald: the Swiss army knife of computing, something for everyone :) • Something you can use today • A reference for something you can use tomorrow
Course Objectives Cont. • Educate users on the broader aspects of research computing • Practical knowledge to allow you to efficiently perform your research • Pointers towards more advanced topics
Course Outline • Course Objectives • What are compute clusters and Emerald in particular? • Accessing Emerald • login • file systems • Running jobs on Emerald – Job Management • job schedulers • batch commands • submitting jobs • specialty scripts • Available Software • software • package space • Compiling Code
Help Documentation • Getting Started on Emerald • http://help.unc.edu/6020 • General overview of Emerald for range of users • Short Course – Getting Started on Emerald • http://help.unc.edu/6479 • Detailed notes for beginning Emerald users
What is Emerald? • General Purpose Linux Cluster • Maintained by Research Computing Group • Appropriate for all users regardless of expertise level • Other Servers: • Cedar/Cypress (128-processor SGI/Altix) • a large shared memory system • Topsail (4160-processor Dell Linux Cluster) • homogeneous capability cluster with fast interconnect • Mass Storage • Account access Retiring
What is a compute cluster? Some Typical Components • Compute Nodes • Interconnect • Shared File System • Software • Operating System (OS) • Job Scheduler/Manager • Mass Storage
Compute Nodes Xeon blades, IBM Power 4 and Power5 Interconnect Gigabit Ethernet (aka gigE or GbE) Shared File Systems AFS, NFS, and GPFS Mass Storage ~/ms Software much licensed and public domain s/w in package space Operating Systems (OS) RH5 (64bit), RH4 (32 bit) and AIX (64 bit) Job Scheduler/Manager all handled by LSF Emerald is a HeterogeneousCluster
Advantages of Using Emerald • High performance • Large capacity • Parallel processing • Many available software packages • Variety of compiling options • Shared file systems • Mass storage
Emerald Compute Nodes • Mostly IBM BladeCenter Xeon blades • all are dual Socket Intel Xeons • 1, 2, or 4 cores/socket (i.e. 2,4,8 processors/node) • 2.0, 2.8, 3.0, 3.2 GHz processors • varying memory, mostly 2 or 4 GB per core • IBM Power 4 and 5 • large memory, varying processor speeds • Cluster is constantly evolving
Emerald Blades No! Yes! A chassis with 14 blades
Emerald Summary • Over 200 host blade nodes, Intel Xeon • Over 800 blade cores • typically 2-4 GB memory per core • 4 IBM AIX p575’s, Power 5 • 64 cores, large memory • 2 large memory Intel “Nehalem” X5570 nodes • 8 cores, 96 GB memory, 2.93 Ghzcpu • Gigabit Ethernet switching fabric • Running 32 and 64 bit Linux and 64 bit AIX
Emerald Details • Run the lshosts command to see resources for each node (host). Note host, model, ncpus, maxmem, resources • %lshosts • HOST_NAME type model cpufncpusmaxmemmaxswp server RESOURCES • bc12-n01 X86_64 Xeon_3_2 12.0 23954M 996M Yes (X64bit blade blade12 L26 lammpi mem3 mem4 mpich2 mpichp4 RH5 tmp25G xeon32) • bc10-n10 X86_64 Xeon_2_8 11.7 2 3954M 996M Yes (X64bit blade blade10 L26 lammpi mem3 mem4 mpich2 mpichp4 RH5 tmp25G xeon28) • bc09-n01 X86_64 Xeon_2_8 11.7 2 3954M 996M Yes (X64bit blade blade9 L26 lammpi mem3 mem4 mpich2 mpichp4 RH5 tmp25G xeon28) • bc01-n01 X86_64 Xeon_3_0 11.9 8 32190M 29313M Yes (X64bit blade blade1 L26 lammpi mem32 mpich2 mpichp4 RH5 tmp100G xeon30)
Logging Into Emerald • UNIX/Linux/OSX • ssh my_onyen@emerald.unc.edu • ssh –l my_onyen emerald.unc.edu • Windows: SSH Secure Shell • X windows software -> shareware.unc.edu • Setting up a Profile for Emerald • Forwarding X11 packets
Head Nodes • Emerald has multiple head nodes or login nodes for • login and basic file manipulation • compiling • testing short (~ <1 min), small memory jobs • Login nodes run the Linux operating system • take the Introduction to Linux class or see some of the many online tutorials if you are unfamiliar with Linux
Home Directory on Emerald Home Directory /afs/isis/home/m/y/my_onyen/ 250 MB quota ~/private/ Files backed up daily [ ~/OldFiles ] Space quota/usage in Home Directory: fslq
Work Directories on Emerald No space limit but periodically cleaned Not backed up!!! Work Directories: /netscr/my_onyen, /nas/my_onyen, /nas2/my_onyen totals 26.2 TB /largefs optimized for large file operations (> 1MB) 23 TB /smallfs optimized for small file operations (< 1MB) 16 TB
File Permissions • Your home directory is in AFS space. AFS is a distributed networked file system. • Permissions are determined by ACLs (access control lists) • see Introduction to AFS (http://help.unc.edu/215) • The other files systems, /largefs, /netscr, etc. are controlled by the usual Linux file permissions • making everything under /netscr/myOnyen accessible: chmod –R a+rX /netscr/myOnyen
Mass Storage • access via ~/ms • looks like ordinary disk file system – data is actually stored on tape • “limitless” capacity • data is backed up • For storage only, not a work directory (i.e. don’t run jobs from here) • if you have many small files, use tar or zip to create a single file for better performance • Sign up for this service on onyen.unc.edu “To infinity … and beyond” - Buzz Lightyear
What does a Job Scheduler and batch system do? Manage Resources • allocate user tasks to resource • monitor tasks • process control • manage input and output • report status, availability, etc • enforce usage policies
LSF • All Research Computing clusters use LSF to do job scheduling and management • LSF (Load Sharing Facility) is a (licensed) product from Platform Computing • Fairly distribute compute nodes among users • enforce usage policies for established queues • most common queues: int, now, week, month • RC uses Fair Share scheduling, not first come, first served (FCFS) • LSF commands typically start with the letter b (as in batch), e.g. bsub, bqueues, bjobs, bhosts, … • see man pages for much more info!
Simplified view of LSF job dispatched to run on available host which satisfies job requirements Jobs Queued job_J job_F myjob job_7 Login Node job routed to queue bsub –R X64bit –q week myjob user logged in to login node submits job
Common batch commands • bsub - submit jobs • bqueues – view info on defined queues • bqueues –l week • bkill – stop/cancel submitted job • bjobs – view submitted jobs • bjobs –u all • bhist – job history • bhist –l <jobID> • bhosts – status and resources of hosts (nodes)
Common batch commands • bpeek – display output of running job • Use man pages to get much more info! • man bjobs • bfree – query LSF to find job slots currently available that fit your resource requirement • this is a RC command extension • bfree –help (or –h) • jobmon – monitor changes in job status • this is a RC command, typically runs in a separate window
Submitting Jobs: bsub Command Submit Jobs - bsub All files must be in scratch space, e.g. /netscr, /largefs, /smallfs Home directory is not mounted on compute nodes bsub [- bsub_opts] executable [-exec_opts]
bsub continued • Common bsub options: • –o <filename> • –o out.%J • -q <queue name> • -q now • -R “resource specification” • -R xeon30 • -n <number of processes> • used for parallel, MPI jobs • -a <application specific esub> • -a mpichp4 (used on MPI jobs)
Two methods to submit jobs: • bsub example: submit the executable job, myexe, to the week queue to run on a 64 bit Linux OS and redirect output to the file out.<jobID> (default is to mail output) • Method 1: Command Line • bsub –q week –R X64bit –o out.%J myexe • Method 2: Create a file (details to follow) called, for example, myexe.bsub, and then submit that file. Note the redirect symbol, < • bsub < myexe.bsub
Method 2 cont. • The file you submitted will contain all the bsub options you want in it, so for this example myexe.bsub will look like this • #BSUB –q week • #BSUB –o out.%J • #BSUB –R X64bit • myexe • This is actually a shell script so the top line could be the normal #!/bin/csh, etc and you can run any commands you would like. • if this doesn’t mean anything to you then nevermind :)
Parallel Job example Batch Command Line Method • bsub –q week –o out.%J-n 30 -a mpichp4 mpirun.lsf myParallelExe Batch File Method • bsub < myexe.bsub • where myexe.bsub will look like this #BSUB –q week #BSUB –o out.%J #BSUB –a mpichp4 #BSUB –n 30 mpirun.lsf myexe
Submitting Jobs: Specialty Scripts • Running a SAS job through batch (2 ways) • bsub -q week -R blade sas program.sas • bsas test.sas • Running a Matlab job through batch (2 ways) • bsub -q week -R blade matlab -nodisplay -nojvm -nosplashprogram.m -logfile program.log • bmatlabtest.m
Interactive Jobs: Setup X-Windows Linux/OSX X11 client Windows X-Win32 Offered on UNC Software Acquisition site https://shareware.unc.edu Port forwarding on SSH Secure Shell Setting up a session on X-Win32
Interactive Jobs: Submission • –Ip or -Is • bsub –q int –R blade –Ip sas • bsub –q int –R blade –Ip gv • bsub –q int –R blade –Ip matlab • bsub –q int –Is tcsh • Specialty Scripts • xsas • xstata
Licensed Software over 20 licensed software applications (some are site licensed, others restricted) Matlab, Maple, Mathematica, Gaussian, Accelrys Materials Studio and Discovery Studio modules, Sybyl, Schrodinger, SAS, Stata, ArcGIS, NAG, IMSL, Totalview, and more. compilers (licensed and otherwise) intel, PGI, absoft, gnu, IBM Numerous other packages provided for research and technical computing including BLAST, PyMol, SOAP, PLINK, NWChem, R, Cambridge Structural Database, Amber, Gromacs, Petsc, Scalapack, Netcdf, Babel, Qt, Ferret, Gnuplot, Grace, iRODS, XCrySDen, and more.
Available Software • Most of the software is installed under AFS and is made available through package space. • AFS (Andrew File System) is a distributed networked file system. Your home directory and software packages are mounted in AFS space. • A new token is issued at login and it expires after 24 hours. Use klog to renew this. • Changes made to your package space are preserved over login sessions.
Package Space • Use ipm (Isis Package Manager) to manage your packages. • ipm commands • ipm add (ipm a) • ipm remove (ipm r) • ipm query (ipm q) • Available packages • http://help.unc.edu/1689 • man ipm
Compiling on Emerald Compilers FORTRAN 77/90/95 C/C++ Parallel Computing MPI (MPICH, LAM/MPI, MPICH-GM) OpenMP
Compiling MPI programs • Use the MPI wrappers to compile your program • mpicc, mpiCC, mpif90, mpif77 • the wrappers will find the appropriate include files and libraries and then invoke the actual compiler • for example, mpicc will invoke either gcc, icc, or pgcc depending upon which package you have loaded
Compiling Details on Emerald • Add a compiler into your working environment • ipm add package_name • Compile a code • command code.c –o executable • Run executable on a compute node using the bsub command • bsub –q week –R blade executable
Contacting Research Computing • Questions? • For assistance with Emerald, please contact the Research Computing Group: • Email: research@unc.edu • Phone: 919-962-HELP • Submit help ticket at http://help.unc.edu