460 likes | 693 Views
KAUST Supercomputing Laboratory Orientation Workshop. October 13, 2009 . Agenda. Introduction to KAUST SL and team Computational resources currently available C omputational resources available in the near future Getting an account on KAUST SL machines Q & A Machine-room viewing.
E N D
KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009
Agenda • Introduction to KAUST SL and team • Computational resources currently available • Computational resources available in the near future • Getting an account on KAUST SL machines • Q & A • Machine-room viewing KAUST King Abdullah University of Science and Technology
KAUST Supercomputer Lab Our Mission • To offer resources that are world-class in both capacity and diversity • HPC systems (BG/P, GPUs, SMP, Linux) • Data systems (on-demandfilesystems, archive) • To assist KAUST researchers to fully exploit these resources • Via a talented, skilled and experienced staff • Joint research collaborations between SL team and researchers • SL team will conduct its own HPC exploitation research Thank you for your continued patience and understanding!! KAUST King Abdullah University of Science and Technology
The KAUST SL team Management Jim Sexton Richard Orme Systems Administration Jonathon Anderson Iain Georgeson Research & Enablement AronAhmadia Samar Aseeri DodiHeryadi Mark Cheeseman Ying Qian **Possibility of getting IBM expertise as part of CDCR collaboration
Currently available resources • Capability machines (Blue Gene/P) • WATSONshaheen • Shaheen(early user access only) • Capacity machines (linux clusters) • WATSONlinux • Shaheen(early user access only) • Texas A&M University linux clusters • Data stores • Storage available at WATSON (not backed up) • 0.5TB shared on Shaheen KAUST King Abdullah University of Science and Technology
Available Blue Gene/P SystemsKAUST SL Orientation Session October 13, 2009
Blue Gene/P – compute design Rack 32 node cards, 16-32 IO cards 4TB DDR2, 13.9 TF/s Shaheen System 4 or 8 racks 16 or 32TB DDR2 55.6 or 111.2 TF/s Node Card 32 compute cards, 0-2 IO cards 128GB DDR2, 435 GF/s CPU 4 ‘cores’ @ 850MHz 13.6 GF/s Compute Card 1 cpu, 4GB DDR2
Blue Gene/P – communication networks • 3D Torus point-to-point communication twelve 425 MB/s links per compute node (5.1 GB/s total) 41 or 167 TB/s for system • Collective • Optimized collective operations (broadcast, reduction, …) • Three 850 MB/s links per compute/IO node (5.1 GB/s total) • Serves as connection between compute and IO nodes • Low latency barriers and interrupt • External • 10GbE connection for external communication (file IO)
Accessing the Shaheenplatforms KAUST researchers have access to 2 BG/P’s • WATSONshaheen (4 racks) • Shaheen (8 racks) Need to ssh/sftp into a front-end machine kstfen1.watson.ibm.com kstfen2.watson.ibm.com shaheen.hpc.kaust.edu.sa NOTE: front-end machines are of a different architecture Power6 (32-way) 64 GB Access to shared GPFS filesystems
Filesystem layout • WATSONshaheen • 5 PB scratch GPFS (not backed up) • No archive access for KAUST users • Users responsible for backing up important • Shaheen • Currently only 0.5PB available • Archive is not available • 3 GPFS file-systems shared between BG/P and Xeon cluster • Home • Project (to be shared between users of the same project) • scratch
Shaheen – programming environment Full IBM and GNU compiler suite available on both BG/P systems MPI and OpenMP supported Supported software: • located under /soft • Control by MODULES Please allow some time for Shaheen supported software stack to be built
Shaheen - compiling Do not use normal compiler calls (gcc, gfortran, xlc, …) • create binaries that run on login nodes NOT the compute nodes • Architecture difference between login and compute nodes Use IBM-provided wrapper compiler commands • create binaries for the compute nodes • Includes native MPI support IBM GNU mpicc –otest.exetest.cc mpicxx –otest.exetest.cpp mpif90 –otest.exetest.f mpif77 –otest.exe test.f90 mpixlc –otest.exetest.cc mpixlcxx –otest.exetest.cpp mpixlf90 –otest.exe test.f90
Shaheen – running a job • WATSONshaheen • No job management or queuing system present • All jobs ran interactively via the mpirun command • mpirun –np 16 –partition r001n00-c32i2 –VN –cwdpwd –exe test.exe where -np indicates # of MPI tasks -partition indicates BG/P partition to use -VN indicates the run mode -cwd gives the rutime directory -exe gives the name of the executable to be ran • In above example, test.exe is ran on 4 quad-core CPUs in the current directory • How do I find an appropriate BG/P partition? /soft/bgp_partition_finder <#_of_quad-core_cpus> NOTE: only a simple script that may fail sometimes
Shaheen – running a job continued • WATSONshaheen continued… • Run modes • SMP: 1 MPI task per CPU. 4GB available to task • DUAL: 2 MPI tasks per CPU. 2GB available for each task • VN: 4 MPI tasks per CPU. 1GB available for each task • Shaheen • LoadLeveler job management system to be used • 2 queues (12hr production and 30min development) • Users do not need to specify partition id • Pre/Post-processing work to be ran on linux cluster • Shared filesystems allow easy data transfer
Available IBM Linux ClustersKAUST SL Orientation Session October 13, 2009
IBM Linux clusters - overview • KAUST researchers have access to two clusters • WATSONlinux (32-node system @ NY, USA) • Shaheen (96-node system @ KAUST) NOTE: these systems are primarily intended to be used as auxiliary computational resources for pre/post-processing, and initial x86 code tests prior to their enablement on Shaheen
Accessing the linux clusters Need to ssh/sftp into a front-end machine kstxfen1.watson.ibm.com kstxfen2.watson.ibm.com shaheenx.hpc.kaust.edu.sa KAUST King Abdullah University of Science and Technology
IBM Linux clusters - Modules • a simple mechanism to update a user's environment such as PATH, MANPATH, NLSPATH, LD_LIBRARY_PATH, etc. • module list -> to show currently loaded modules • module avail -> to show available modules • module what-is <name> -> to describe <name> module • module load <name> -> to load <name> module [xxxxxxxx@n1 ~]$ module avail --------------------------------- /opt/modules --------------------------------- Loadleveler hdf5 postprocessing/nccmp compilers/GNU netcdfpostprocessing/ncl compilers/INTEL netcdf4 totalview fftw2 postprocessing/ferret wien2k fftw3 postprocessing/grads KAUST King Abdullah University of Science and Technology
IBM Linux clusters – Programming Environment • Compilers GNU and Intel compilers (C, C++ and FORTRAN) available PGI has been ordered • MPI Support3 MPICH2 is default MPICH1 and OpenMPI are available as well • It is strongly encouraged that Modules be used for compiling and linking KAUST King Abdullah University of Science and Technology
IBM Linux clusters –Compiling serial codes • Intel compilers: module load compilers/INTEL ifort-> calls the Intel Fortran compiler icc-> calls the Intel C compiler icpc-> calls the Intel C++ compiler • GNU compilers: • module load compilers/GNU • gfortran-> calls the GNU Fortran compiler • gcc -> calls the GNU C compiler • g++ -> calls the GNU C++ compiler KAUST King Abdullah University of Science and Technology
IBM Linux clusters – Compiling MPI codes • Intel compilers: module load compilers/INTEL mpicc-> calls the Intel C compiler with MPI support enabled mpic++-> calls the Intel C++ compiler with MPI support enabled mpif77-> calls the Intel F77 compiler with MPI support enabled mpif90-> calls the Intel F90 compiler with MPI support enabled • GNU compilers: • module load compilers/GNU • mpicc -> calls the GNU C compiler with MPI support enabled • mpic++ -> calls the GNU C++ compiler with MPI support enabled • mpif77 -> calls the GNU F77 compiler with MPI support enabled • mpif90 -> calls the GNU F90 compiler with MPI support enabled
IBM Linux clusters – INTEL MKL • The following Intel Math Kernel Libraries are available BLAS LAPACK BLACS SCALAPACK KAUST King Abdullah University of Science and Technology
IBM Linux clusters – INTEL MKL • Linking codes with Intel MKL BLAS and LAPACK Static, sequential, 64-bit integer • $MKLPATH/libmkl_solver_ilp64_sequential.a -Wl,--start-group $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread Dynamic, multi-threaded, 64-bit integer • -L$MKLPATH $MKLPATH/libmkl_solver_ilp64.a -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread KAUST King Abdullah University of Science and Technology
IBM Linux clusters – INTEL MKL • Linking codes with Intel MKL SCALAPACK and BLACS SCALAPACK: Static, sequential, 64-bit integer, MPICH2 • $MKLPATH/libmkl_scalapack_ilp64.a $MKLPATH/libmkl_solver_ilp64.a -Wl,--start-group $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -openmp -lpthread BLACS: Dynamic, multi-threaded, 64-bit integer, MPICH2 • -L$MKLPATH $MKLPATH/libmkl_solver_ilp64.a -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -Wl,--end-group -openmp -lpthread KAUST King Abdullah University of Science and Technology
IBM Linux clusters – running a job • LoadLeveler job management and queuing system present Useful LoadLeveler commands llsubmit<job file>submit a job to LoadLeveler llqshow queued and running jobs llcancel<job_id>delete queued or running job llstatusdisplay system information [xxxxx@n1 ~]$ module load compilers/INTEL [xxxxx@n1 ~]$ module load Loadleveler [xxxxx@n1 ~]$ llsubmitjobscript llsubmit: The job "n1.linux32.watson.ibm.com.96" has been submitted. [xxxxx@n1 ~]$ llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- n1.96.0 xxxxx 10/13 03:42 R 50 No_Class n1 1 job step(s) in queue, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted
IBM Linux clusters – constructing a jobfile • EXAMPLE: parallel job with only MPI tasks • #! /bin/csh -f #@ output = out #@ error = err #@ job_type = parallel #@ node = 1 #@ notification = never #@ environment = COPY_ALL #@ queuecd$LOADL_STEP_INITDIRmpdboot-n 1 -f ${LOADL_HOSTFILE}mpiexec-n 8 ./hello_intelmpdallexit • **Here 8 MPI tasks are spawned a single (2 quadcore Xeon) compute node KAUST King Abdullah University of Science and Technology
IBM Linux clusters – constructing a jobfile • EXAMPLE: parallel job with only OpenMP threads • #! /bin/csh -f #@ output = out #@ error = err #@ job_type = parallel #@ node = 1 #@ notification = never #@ environment = COPY_ALL #@ queue • setenv OMP_NUM_THREADS 8 • cd$LOADL_STEP_INITDIR ./hello_omp_gnu • **Here 8 OpenMP threads are spawned a single (2 quadcore Xeon) compute node KAUST King Abdullah University of Science and Technology
IBM Linux clusters – constructing a jobfile • EXAMPLE: parallel job with 2 MPI tasks that each spawn 8 OpenMPthreads • #! /bin/csh -f #@ output = out #@ error = err #@ job_type = parallel #@ node = 2 #@ notification = never #@ environment = COPY_ALL #@ queuesetenvOMP_NUM_THREADS 8cd$LOADL_STEP_INITDIR • mpdboot-n 2 -f ${LOADL_HOSTFILE}mpiexec-np 2 ./hello_mpi_omp_intelmpdallexit **Here 8 OpenMP threads are spawned a single (2 quadcore Xeon) compute node KAUST King Abdullah University of Science and Technology
IBM Linux clusters – 3rd party software • Installation/support of 3rd party software is based on the mutual agreement between the requesting PI and KAUST SL Supported software: • located under /opt • Control by MODULES Please allow some time forsupported software stack to be built KAUST King Abdullah University of Science and Technology
Available Texas A&M Linux ClustersKAUST SL Orientation Session October 13, 2009
Resources Available in the “Near Future”KAUST SL Orientation Session October 13, 2009
More resources are on the way… • Shaheen installation continues • Expansion from 8 to 16 racks • Full 1.9PB shared disk space • Archive not built yet • Other HPC systems are being shipped • 256 node x86 linux cluster • 4 SMP nodes • 16 TESLA GPGPU nodes • 1 PB shared disk KAUST King Abdullah University of Science and Technology
Project & Account Creation ProceduresKAUST SL Orientation Session October 13, 2009
Accessing Shaheen • Organization application • Terms and conditions acknowledgement • Funding authorization • Project proposal • Scientific description • Authorized researchers • Individual application • Personal information KAUST King Abdullah University of Science and Technology
Accessing Shaheen (restrictions) • Nationals of “group E” countries • Cuba • Iran • North Korea • Sudan • Syria KAUST King Abdullah University of Science and Technology
Accessing Shaheen (restrictions) • Unauthorized research • Weapons • Rockets • Unmanned aerial vehicles • Nuclear fuel facilities (except by treaty) • Heavy water production facilities (except by treaty) KAUST King Abdullah University of Science and Technology
Contacting Us • Our internal wiki/website is available to KAUST users • http://www.hpc.kaust.edu.sa • For HPC Support Queries • shaheen-help@kaust.edu.sa • Or drop by and see us in person • Level 0, Building 1(across from the cafeteria) • Offices 0121-0126 KAUST King Abdullah University of Science and Technology
Thank you for your attention Questions?