1 / 23

Usage Seminar for the 64-nodes P4-Xeon Cluster in Science Faculty

Usage Seminar for the 64-nodes P4-Xeon Cluster in Science Faculty. March 24, 2004. Aims and Target audience. Aims: Usage review Introducing news and events Sharing among existing users Target audience Existing and potential cluster users. Today’s Outline.

jendayi
Download Presentation

Usage Seminar for the 64-nodes P4-Xeon Cluster in Science Faculty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Usage Seminar for the 64-nodes P4-Xeon Cluster in Science Faculty March 24, 2004

  2. Aims and Target audience • Aims: • Usage review • Introducing news and events • Sharing among existing users • Target audience • Existing and potential cluster users

  3. Today’s Outline • Introduction to upgraded software and its usage • Review of serial and parallel job submission • Software demo • Briefing of coming recurring parallel computing course • Sharing and opinions from the existing users

  4. System upgrade • February 27, 2004 • Upgrade to ROCKS 3.1.0 • SAN adapter inserted • Compute nodes all reinstalled • Many application software upgrade

  5. Hardware Configuration • 1 master node + 64 compute nodes + Gigabit Interconnection • Master node • Dell PE2650, P4-Xeon 2.8GHz x 2 • 4GB RAM, 36GB x 2 U160 SCSI (mirror) • Gigabit ethernet ports x 2 • SCSI attached storage • Dell PV220S • 73GB x 10 (RAID5)

  6. Hardware Configuration (cont) • Compute nodes • Dell PE2650, P4-Xeon 2.8GHz x 2 • 2GB RAM, 36GB U160 SCSI HD • Gigabit ethernet ports x 2 • Gigabit Interconnect • Extreme Blackdiamond 6816 Gigabit ethernet • 256Gb backplane • 72 Gigabit ports (8 ports card x 9)

  7. Software installed • Cluster operating system • ROCKS 3.1.0 from www.rocksclusters.org • MPI and PVM libraries • LAM/MPI 7.0.4, MPICH 1.2.5.2, PVM 3.4.3-6beolin • Compilers • GCC 3.2.3 • PGI C/C++/f77/f90/hpf version 5.1 • MATH libraries • ATLAS 3.6.0, ScaLAPACK, SPRNG 2.0a • Application software • MATLAB 6.1 with MPITB • R 1.8.1 • Gromacs 3.2, NAMD2.5 , Gamess • Gaussian 03, Q-Chem 2.1 • Editors • vi, pico, emacs, joe • Queuing system • Torque/PBS, Maui scheduler

  8. Cluster O.S. – ROCKS 3.1.0 • Developed by NPACI and SDSC • Based on RedHat Entreprise Linux 3.0 • Useful command for users to monitor jobs in all nodes. E.g. • cluster-fork date • cluster-ps morris • cluster-kill morris • Web based management and monitoring • http://tdgrocks.sci.hkbu.edu.hk

  9. Ganglia

  10. Hostnames • Master node • tdgrocks.sci.hkbu.edu.hk • Compute nodes • comp-pvfs-0-1, …, comp-pvfs-0-64 • Short names: cp0-1, cp0-2, …, cp0-64

  11. Network diagram tdgrocks.sci.hkbu.edu.hk Master node 192.168.8.1 Gigibit ethernet switch Compute node Compute node Compute node comp-pvfs-0-1 (192.168.8.254) comp-pvfs-0-2 (192.168.8.253) comp-pvfs-0-64 (192.168.8.192)

  12. Login to the master node • Login is allowed remotely in all HKBU networked PCs by ssh or vncviewer • SSH Login (terminal login) • Using your favourite ssh client software, namely putty, SSHsecureshell on windows and openssh on Linux/UNIX • E.g. on all SCI workstations (spc01 – spc30), type ssh tdgrocks.sci.hkbu.edu.hk

  13. Login to the master node • VNC Login (graphical login) • Using vncviewer download from http://www.uk.research.att.com/vnc/ • E.g. in spc01 – spc30.sci.hkbu.edu.hk, vncviewer vnc.sci.hkbu.edu.hk:51 • E.g. in windows, run vncviewer and upon asking the server address, type vnc.sci.hkbu.edu.hk:51

  14. Username and password • The unified password authentication has been implemented • Same as that of your netware account • Password authentication using NDS-AS • Setup similar to net1 and net4 in ITSC

  15. ssh key generation • To make use of multiple nodes in the PC cluster, users are restricted to use ssh. • Key generation is done once automatically during first login • You may input a passphrase to protect the key pair • The key pair is stored in your $HOME/.ssh/

  16. User Policy • Users are allowed to remote login from other networked PCs in HKBU. • All users must use their own user account to login. • The master node (frontend) is used only for login, simple editing of program source code, preparing the job dispatching script and dispatching of jobs to compute node. No foreground or background jobs can be run on it. • Dispatching of jobs must be done via the PBS system.

  17. Torque/PBS system • Provide a fair and efficient job dispatching and queuing system to the cluster • PBS script shall be written for running job • Either sequential or parallel jobs can be handled by PBS • Jobs error and output are stored in different filenames according to job IDs.

  18. PBS script example (sequential) #!/bin/bash #PBS -l nodes=1 #PBS -N prime #PBS -m ae #PBS -q default # the above is the PBS directive used in batch queue echo Running on host `hostname` /u1/local/share/example/pbs/prime 216091 • PBS scripts are shell script with directives preceding with #PBS • The above example request only 1 node and deliver the job named ‘prime’ in default queue. • The PBS system will mail a message after the job executed.

  19. Delivering PBS job • Prepare and compile executable cp /u1/local/share/example/pbs/prime.c . cc –o prime prime.c -lm • Prepare and edit PBS script as previous cp /u1/local/share/example/pbs/prime.bat . • Submit the job qsub prime.bat

  20. PBS script example (parallel) #!/bin/sh #PBS -N cpi #PBS -r n #PBS -e cpi.err #PBS -o cpi.log #PBS -m ae #PBS -l nodes=5:ppn=2 #PBS -l walltime=01:00:00 # This job's working directory echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR echo Running on host `hostname` echo This jobs runs on the following processors: echo `cat $PBS_NODEFILE` # Define number of processors NPROCS=`wc -l < $PBS_NODEFILE` echo This job has allocated $NPROCS nodes # Run the parallel MPI executable “cpi” /u1/local/mpich-1.2.5/bin/mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS /u1/local/share/example/pbs/cpi

  21. Delivering parallel jobs • Copy the PBS script examples cp /u1/local/share/example/pbs/runcpi . • Submit the PBS job qsub runcpi • Note the error and output files named cpi.e??? and cpi.o???

  22. End of Part 1 Thank you!

  23. MPICH and LAM’s PATH • MPICH 1.2.5.2 with gcc • /u1/local/mpich-1.2.5/bin • MPICH 1.2.5.2 with pgi • /u1/local/mpich-pgi/bin • LAM 6.5.9 • /usr/bin • LAM 7.0.4 • /u1/local/lam-7.0.4/bin

More Related