560 likes | 573 Views
This guide provides an introduction to Iceberg, a high-performance computing cluster at the University of Sheffield. Learn how to connect, use Linux, run applications and distribute work across the cluster.
E N D
Getting Started with HPC On Iceberg Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid
Introduction to Iceberg Getting connected Basic Linux Using the Linux Operating System Running Applications and Jobs on Iceberg How to distribute work across the cluster The N8 Facility Getting Help Outline
Types of Grids • Cluster Grid • Computing clusters ( e.g. iceberg ) • Enterprise Grid, Campus Grid, Intra-Grid • Departmental clusters, servers and PC network • Cloud, Utility Grid • Access resources over internet on demand • Global Grid, Inter-grid • White Rose Grid, National Grid Service, Particle Physics Data Grid
Iceberg Cluster There are two head-nodes for the iceberg cluster login login login HEAD NODE1 Iceberg(1) HEAD NODE2 Iceberg(2) qsh,qsub,qrsh qsh,qsub,qrsh Worker node Worker node Worker node Worker node Worker node Worker node There are 232 worker machines in the cluster All workers share the same user filestore Worker node Worker node Worker node Worker node Worker node Worker node
iceberg cluster specifications AMD-based nodes containing; 96 nodes each with 4 cores and 16 GB of memory 31 nodes each with 8 cores and 32 GB of memory TOTAL AMD CORES = 632, TOTAL MEMORY = 2528 GBThe 8-core nodes are connected to each other via 16 GBits/sec infiniband for MPI jobsThe 4-core nodes are connected via the much slower "1 Gbits/sec" ethernet connections for MPI jobs. Scratch space on each node is 400 GBytes Intel Westmere based nodes all infiniband connected, containing; 103 nodes each with 12 cores and 24 GB of memory ( i.e. 2 * 6-core Intel X5650 ) 4 nodes each with 12 cores and 48 GB of memory 8 Nvidia Tesla Fermi M2070s GPU units for GPU programming TOTAL INTEL CPU CORES = 1152 , TOTAL MEMORY = 2400 GBScratch space on each node is 400 GBTotal GPU memory = 48 GBEach GPU unit is capable of about 1TFlop of single precision floating point performance, or 0.5TFlops at double precision. Hence yielding maximum GPU processing power of upto 8 TFlops in total.
Introduction to Linux on Iceberg • What are UNIX and Linux? • The Shell • Getting Connected • Basic UNIX/Linux Commands • Working with Directories • Editing files • Help! • Running Programs • Exercise
What is UNIX/Linux ? • Multi-Tasking O/S • Multi-User O/S • Available on a range of Computers
History of UNIX/Linux? . • Unix operating system was developed around 1969 in the Bell Labs • Originally written using C • Around 1990 Linus Torvalds of Helsinki University started off a freely available academic version of Unix • Linux is the Antidote to a microsoft dominated future
Which UNIX? • SunOS Sun Microsystems – now OpenIndiana, OpenSolaris from Oracle • IRIX Silicon Graphics • HP-UX Hewlett Packard • Linux For IBM PC compatibles There are a number of certification bodies with published standards and test suites to ensure quality of products. Such as; • Posix: Portable Operating System Interface ( IEEE standard).
Linux and GNU • Linux is an implementation of Unix • Linux/Unix operating system is written in ‘C’ • Linux is not part of the GNU project but uses the same licensing agreements • Many of the linux utilities and tools are taken from the GNU project. • There are many flavours of linux distributions. The mix of the kernel (linux) with the utilities (GNU and other) and the installation procedure determine the flavour. Some of these are; • Fedora • SuSE • Redhat • Debian • Mandrake • Knoppix
UNIX Internals (Simplified) • Kernel • System Internals • Shell • Command Interpreter • Programming language • File System • Process Management
UNIX Shells • sh Bourne Shell (Original Shell) (Steven Bourne of AT&T) • bash Bourne Again Shell (GNU Improved Bourne Shell) • csh C-Shell (C-like Syntax)(Bill Joy of Univ. of California) • ksh Korn-Shell (Bourne+some C-shell)(David Korn of AT&T) • tcsh Turbo C-Shell (More User Friendly C-Shell). You can switch from one shell to another by just typing the name of the shell. exit return you back to previous shell.
Getting an account All staff and research students are entitled to use iceberg For Registration See: http://www.shef.ac.uk/wrgrid/register Staff can have an account by simply emailing ucards-reg@sheffield.ac.uk
passwords • In normal linux environment the passwd command can be used to change the user passwords. However, because we manage passwords centrally this command will not work on iceberg. • If you wish to change your iceberg password you will have to do this via a web interface at the following URL: http://www.shef.ac.uk/cics/password
Access icebergRemote logging in Terminal access is described at:http://www.shef.ac.uk/wrgrid/using/access Recommended access is via any browser at: www.shef.ac.uk/wrgrid This uses Sun Global Desktop ( All platforms, Graphics-capable)Also possible: Using an X-Windows client ( MS Windows, Graphics-capable) Exceed 3D Cygwin Various ssh clients (MS Windows, Terminal-only ) putty, SSH Note: ssh clients can also be used in combination with Exceed or Cygwin to enable graphics capability. Above web page describes how this can be achieved.
Access icebergfrom MAC or Linux platforms The web browser method of access ( as for Windows platforms) also works on these platforms. More customary and efficient method of access is by using the ssh command from a command-shell.Example: ssh –X iceberg_username@iceberg.shef.ac.ukNote1:-X parameter is needed to make sure that you can use the graphics or gui capabilities of the software on iceberg.Note2: Depending on the configuration of your workstation you may also have to issue the command : xhost + before the ssh command.
Basic X Concepts • X Server runs on local machine • PC Exceed, Cygwin, Xming • UNIX Workstation Included in OS • Apple Mac Exodus • X Client runs on remote machine • Graphical Application • xterm • xcalc • Modelling and visualisation packages etc.
Multiple ssh or xterm shells There are no limits to the number of ssh or xterm windows one can start simultaneously by methods described in the previous slides. You may also start extra xterm windows from the host by simply typing xterm & On iceberg we also have a local command named Xterm that starts up an xterm window with nicer to use parameters. On iceberg we strongly recommend that you use Qsh rather than xterm or Xterm command so as to make use of a free worker node. Qsh will act like Xterm but will make use of a worker node. Typing exit will terminate an xterm or ssh session neatly. This will also close the xterm window but not the ssh window. ssh windows can be closed via the file exit menu.
Web browser method of access: Sun Global Desktop Start session on headnode Start an interactive session on a worker Start Application on a Worker Help
Operating system and utilities Linux Operating System version is- Scientific Linux, which is RedHat Enterprise 5 based Default Shell is- bash Available editors-Graphical Editors: gedit( not available on headnode) emacs nedit Text editors: vi pico ( not available on headnode) nano
Login Environment Default shell is bash ( you can request to change your default shell ). On login into iceberg many environment variables are setup automatically to provide a consistent interface and access to software. Each user has a .bashrc file in their directory to help setup the user environment. Type set to get a list of all the environment variables. Change all variables that are in CAPITALS with extreme care. Modify/enhance your environment by adding commands to the end of your .bashrc file , such as alias commands. (Again do this with care! )
Some basic rules • Unix is case sensitive. • Commands are in lower case. • Backspace and/or Del Keys correct typing errors. If the terminal parameters are not correctly set; try Ctrl+H • Ctrl+C Aborts a program or command. • You can use the arrow keys to recall previous commands, optionally edit and execute them.
Format of Unix commands • command [option ...] [filename ...] eg: ls ls -l tutorial more tutorial
List Directory • ls list directory • ls -l list directory in long format • ls -a list all (inc. hidden) files -rw------ l course01 57 Oct 18 11:05 hello.c Access Permissions Number ofbytes in file Date and time last modified
Directory Structure / (root) home usr cs4un1 cs4un2 Home directory of user cs4un1 : /home/cs4un1 When you log in you are positioned in your home directory. The environment variable $HOME is also set to contain this directory name.
Working with Directories • pwd print working directory • cd change directory cd move to home directory cd .. move up one level cd mydir move into a subdirectory cd /var/adm move to an absolute directory • mkdir directory_namecreate a new directory • rmdir directory_name delete an empty directory
Filenames • Filenames can comprise of: a-z, A-Z alphabetic characters 0-9 digits .-_+ special characters mon+tue_01.06-03-96 • Wildcards when referencing files * any character or sequence of characters ? any single character
Displaying contents of a text_file • more filename This command will start listing the contents of filename on screen and pause after a screenfull of data. While pausing, use the following characters to control the output. Spacebar next screenful n Spacebar : next n lines Enter next line b back one screen n b : back n screen’s full q quit ? or h list commands where n is a whole number
Displaying contents of a text_file… continued • cat [options] filename [filename … ] This command will output the contents of filename[s] to standard-output ( normally screen) without pausing. Following options are useful; -v display non-printing characters -n display with lines numbered on the left • tail [-n] filename Thiscommand lists the last 10 lines of a text file. If a number is specified (.eg. -20 ) lists the last n (i.e 20) lines
Copying files • Copy files (optionally directories) cp fromfile tofile Some of the useful options are: -R or –r : Recursive copy ‘fromfile’ is a directory so the entire directory and its contents are copied. e.g. cp –r mydir newdir -p : preserve. Preserves all attributes of the file ,such as access rights and creation date. • Copy and concatenate files by using cat Cat command concatenates contents of list of files and directs the output to standard output (normally screen). When used with redirection ‘>’ it can be used to join files together. e.g. cat file1 file2 file3 > new_big_file
Renaming and deleting files • mv :This command will move a file or directory to a new location. It can thus be used to rename files/directories as well as change their locations in the global directory structure. Syntax: mv source destination Example: mv myfile mynewfile mv myfile subdirectory/myfile mv mysubdir mynewsubdir • rm : This command will delete a file (optionally a directory if used with –r option). Syntax: rm object_to_delete Example: rm myfile rm –r mydirectory
Working with files To copy a file: cp my_file my_new_file To move ( or rename ) a file : mv my_file my_new_file To delete a file : rm my_file To list the contents of a file : less file_name To make a new directory( i.e. folder) : mkdir new_directory To copy a file into another directory: cp my_file other_directory To move a file into another directory: mv my_file other_directory To copy a directory to another location: cp –R directory other_directory To remove a directory and all its contents!!!: rm –R directory ( use with care ) Wildcards : * means matching any sequence of characters. For example: cp *.dat my_directory
Problem Session • Attempt problem 1-5 on the handout
Transferring files to/from iceberg Summary of file transfer methods as well as links to downloadable tools for file transfers to iceberg are published at:http://www.sheffield.ac.uk/wrgrid/using/access Command line tools such as scp, sftp and gftp are available on most platforms. Can not use ftp ( non-secure ) to iceberg. Graphical tools that transfer files by dragging and dropping files between windows are availablewinscp, coreftp, filezilla, cyberduck
Pitfalls when transferring files ftp is not allowed to by iceberg. Only sftp is accepted. Do not use spaces ‘’ in filenames. Linux do not like it. Secure file transfer programs ‘sftp’ classify all files to be transferred as either ASCII_TEXT or BINARY. All SFTP clients attempt to detect the type of a file automatically before a transfer starts but also provide advanced options to manually declare the type of the file to be transferred. Wrong classification can cause problems particularly when transfers take place between different operating systems such as between Linux and Windows. If you are transferring ASCII_TEXT files to/from windows/Linux, to check that transfers worked correctly while on iceberg, type;cat –v journal_fileIf you see a ^M at the end of each line you are in trouble !!! CURE: dos2unix wrong_file on iceberg
Your filestore Three areas of filestore always available on iceberg These areas are available from all headnode and worker nodes. home directory: /home/username 5 GBytes allocations Permanent, secure, backed up area ( deleted files can be recovered ) data directory /data/username 50 GBytes of ollocation Not backed but mirrored on another server /fastdata area /fastdata Much faster access from MPI jobs No storage limits but no backup, or mirroring Files older than 90 days gets deleted automatically Always make a directory under /fastdata and work there.
Scratch area ( only available during a job) Located at /scratch Used as temporary data storage on the current compute node alone. File I/O to /scratch is faster than to NFS mounted /home and /data areas File I/O to small files is faster than to /fastdata but … for large files /fastdata is faster than /scratch Data not visible to other worker nodes and expected to exist only during the duration of the job. HOW TO USE SCRATCH: Create a directory using your username under /scratch on a worker and work from that directory Example: mkdir /scratch/$USER cp mydata /scratch/$USER cd /scratch/$USER ( then run your program )
Storage allocations Storage allocations for each area are as follows: On /home 5 GBytes On /data 50 GBytes No limits on /fastdata Check your usage and allocation often to avoid exceeding the quota by typing quota If you exceed your quota, you get frozen and the only way out of it is by reducing your filestorage usage by deleting unwanted files via the RM command ( note this is in CAPITALS ). Requesting more storage: Email iceberg-admins@sheffield.ac.uk to request for more storage. Excepting the /scratch areas on worker nodes, the view of the filestore is the same on every worker.
Running tasks on iceberg Two iceberg headnodes are gateways to the cluster of worker nodes. Headnodes’ main purpose is to allow access to the worker nodes but NOT to run cpu intensive programs. All cpu intensive computations must be performed on the worker nodes. This is achieved by; qsh command for interactive jobs and qsubcommand for batch jobs. Once you log into iceberg, taking advantage of the power of a worker-node for interactive work is done simply by typing qshand working in the new shell window that is opened. The next set of slides assume that you are already working on one of the worker nodes (qsh session).
Where on the cluster ? Most of the application packages, compilers and software libraries are only available on the worker_nodes. Iceberg headnodes are suitable for only light-weight jobs such as editing files. If you are on one of the headnodes, you will need to type qsh to get an interactive session to the worker nodes. How do you know where you are ? The command prompt will contain your userid@hostname Example: ch1abc@node-056 $ If you are on one of the headnodes, hostname will be iceberg1 or iceberg2 If you are on an amd-based worker-node, it will be amd-nodenn where nn is the number of the amd-node. If you are on an intel-based worker-node it will be node-nnn where nnn is the number of the intel node. You can always type echo $HOSTNAME to find out the name of the machine you are currently using.
Running programs on iceberg • Iceberg is the gateway to the cluster of worker nodes and the only one where direct logging in is allowed. • Iceberg’s main purpose is to allow access to the worker nodes but NOT to run cpu intensive programs. • All cpu intensive computations must be performed on the worker nodes. This is achieved by the qsh command for the interactive jobs and qsub command for the batch jobs. • Once you log into iceberg, taking advantage of the power of a worker node for interactive work is done simply by typing qsh and working in the new shell window that is opened. This what appears to be a trivial task has would in fact have queried all the worker nodes for you and started a session on the least loaded worker in the cluster. • The next set of slides assume that you are already working on one of the worker nodes (qsh session).
Running programs • Two modes of operation foreground and background • Foreground Interact with program via keyboard/screen • Background No connection with keyboard/screen Submit to backbround by Appending ‘&’ EG: myprog >& myfile & The symbols ‘>&’ redirect output and any errors to the file myfile Although the above method of running jobs on the background is feasible, the prio we recommend that you submit your background into the batch queue via the qsub command.
Redirection Most unix commands are not aware of the source of their input or the destination of their output. They simply read/write from/to stdin/stdout. The shell takes care of these issues. • Standard Input (default=>keyboard) • Standard Output (default=>screen) • Redirection symbols <,>,>> can be used to specify files as the source/destination of the read/write operations to override the above defaults.
Redirection continued … Most unix commands are not aware of the source of their input or the destination of their output. They simply read/write from/to stdin/stdout. The shell takes care of these issues. • To redirect the output to a file use the ‘>‘ symbol. Example: ls -l > dirlist • The ‘>‘ symbol should be used with care as it may over-write an existing file. ‘>>’ symbol can be used instead if the output should be appended to the end of an existing file rather than over-writing it. Example: ls -l >> logfile • If nothing is directed to a file then a zero size file is created, or if the file already existed then the contents of the file is removed Example: > afile • The file /dev/null is a special symbol to indicate a ‘black-hole’ Example :ls –l > /dev/null
Redirection continued … If any program expects any of it’s input from the standard-input-channel , i.e. the keyboard, it can also read the same information from a file by redirection. • To read input from a file use the ‘<‘ symbol. Example: write cs1xyz < message.fil Here any text input {write} program expects from the keyboard will be simply read from a file named message.fil. Each end-of-line will be treated as an <ENTER> on the keyboard.
Piping • Feeding the output of one command into the input of another command • The symbol ‘|’ is called a pipe command | command • eg: ls -al | more ls -la | grep Nov
Examples of re-direction and piping • ls –l | grep ‘Jun’ • ls –l | grep `Jun` > june_files • ls –l | grep `Jun` | cut –c 57-80 > june_files • cut –c 1-10 < test_files • aspell –l < message.txt > report.txt • grep fluent < news.dat • grep fluent < news.dat | cut –c 1-72 • (grep fluent < news.dat ) | cut –c 1-72 • (grep fluent < news.dat ) | cut –c 1-72 > fluent.news
Foreground Program Control • Kill a program Ctrl C • Stop a program Ctrl Z Note a stopped program still exists in the system and hence can be re-started.
Program control within current shell • jobs Lists jobs (programs) • bg %job_id Place a job in the background • fg %job_id Return a job to the foreground • stop %job_id Stop a job • kill %job_id Kill a job Process_id can be used in place of %job_id for more definitive way of identification. jobs [1] + Running time.sh > out stop %1 [1] + Stopped (signal) time.sh > out bg %1 [1] + time.sh > out & kill %1 Terminated