510 likes | 604 Views
Introduction to HPC at UNBC. The Enhanced High Performance Computing Center Dr. You Qin (Jean) Wang February 13, 2008. Summary of the presentation:. Who needs HPC? What kind of software do we have? What kind of hardware do we have? How to access the HPC systems?
E N D
Introduction to HPC at UNBC The Enhanced High Performance Computing Center Dr. You Qin (Jean) Wang February 13, 2008
Summary of the presentation: • Who needs HPC? • What kind of software do we have? • What kind of hardware do we have? • How to access the HPC systems? • Parallel programming basics
Who needs HPC? HPC Domains of Applications at UNBC: • Atmospheric Science • Environmental Science • Geophysics • Chemistry • Computer Science • Forest • Physics • Engineering
Who needs HPC? • We use HPC to solve problems that can't be solved in a reasonable amount of time using a single desktop computer. • Problems solved using HPC: • Needs large quantity of RAM • Requires large quantity of CPUs
HPC Users Summery On February 6, 2008: • Total Users: 73 • Professors: 16 • Post-doctoral: 7 • Ph. D. students: 5 • Master Students and Others: 45
What kind of software do we have? • IDL + ENVI • MATLAB + Toolboxes • Tecplot • STATA • NAG Fortran Library • FLUENT • PGI Compilers • Intel Compilers
What kind of software do we have? • IDL – the ideal software for data analysis, visualization, and cross-platform application development • ENVI - the premier software solution to quickly, easily, and accurately extract information from geospatial imagery
What kind of software do we have? • MATLAB is a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation. • MATLAB Toolboxes: • Curve Fitting • Distributed Computing • Image Processing • Mapping • Neural Network • Statistics
What kind of software do we have? • Two images plotted using Tecplot by Dr. Jean Wang Pressure Contour around a Prolate Spheroid
What kind of software do we have? • Why use STATA? • STATA is a complete, integrated statistical package that provides everything you need for data analysis, data management, and graphics.
What kind of software do we have? • The NAG Fortran Library - the largest commercially available collection of numerical algorithms for Fortran today • Calling NAG Library: • Set Environmental Variables before you run your job. LM_LICENSE_FILE=/usr/local/fll6420dcl/license/license.dat export LM_LICENSE_FILE /opt/intel/fc/9.0/bin/ifort -r8 test.for –L /usr/local/fll6420dcl/lib/libnag.a /usr/local/fll6420dcl/lib/libnag.so -o test.exe
What kind of software do we have? • FLUENT – Flow Modeling Software
What kind of hardware do we have? • SGI Altix 3000 – 64 processor • Linux Cluster – 128 processor (Opteron) • File Server • Windows Terminal Server • 10 Workstations in HPC Lab • Geowall systems for visualization
SGI Altix 3000 – columbia.unbc.ca • 64 Processors • Intel Itanium2 (1.5Ghz) • 4Mb Cache • 64 Gb RAM • 1Gb/processor • NumaLink interconnect • 6.4Gb/s • Fat Tree • 10GbE network connection • Suse Linux Enterprise Server 9
Linux Cluster – andrei.unbc.ca • 64 Nodes (128 processors) + Head Node • AMD Opteron (2.1Ghz) (2/node) • 144Gb RAM (2/node + 16 for head) • GigE interconnect • Two nortel switches • Network access via head node. • Operating System • Suse 9.3 • Storage • 1.7 Tb of local storage on head node for software and local copies.
File Server • SGI Altix 350 • 4p, 8Gb RAM • SGI TP9100 • 6Tb Storage • RAID 5, with hot spare. • 10GbE network connection • Maintain type backup
Windows Terminal Server – ithaca.unbc.ca • Dell PowerEdge 6800 • 4p (Intel Xeon, 2.4Ghz) • 8Gb RAM • Local Raid for system volume. • 600Gb volume • Accessible from anywhere. • Runs windows applications.
Workstations at HPC Lab • Dell Precision 470 • 2 Intel Xeon Processors (3.2Ghz) • 2Gb RAM • NVidia Quadro FX3400 / 256Mb • 2 Dell 20” LCD displays.
GeoWall Systems • Two Systems • Both have a 2 processors server, 1.5Tb RAID5 • GeoWall Room (8-111) has rear projected display • Portable unit has front projected display
How to access the HPC systems? • From Windows to Windows From: Start-> All Program -> Accessories -> Communications -> Remote Desktop Connection Computer: pg-hpc-ts-01.unbc.ca • Log on to: UNI
How to access the HPC systems? • From Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-01.unbc.ca • Log on to: UNI
How to access the HPC systems? • From Linux to Linux: • ssh –X yqwang@columbia.unbc.ca • ssh –X yqwang@andrei.unbc.ca • [pg-hpc-clnode-head ~]>ssh -X pg-hpc-clnode-63 • [pg-hpc-clnode-63 ~]>
How to access the HPC systems? • From Windows to Linux: • Download software “Xmanager 2.0” from http://www.download.com/Xmanager/3000-2155_4-10038129.html
How to access the HPC systems? • How to mount /hpc file system? • Under windows: • Simply right click on "My Computer" and select "Map Network Drive"", and then choose \\pg-hpc-fs-01.unbc.ca\LOGIN • replacing LOGIN with your UNI login.
How to access the HPC systems? • How to mount /hpc file system? • On a Linux machine: • smbmount //pg-hpc-fs-01.unbc.ca/LOGIN MOUNTPOINT -o username=LOGIN,uid=LOGIN • replacing MOUNTPOINT with the name of a directory that the system will be mounted to.
Reminder to HPC users: • Don’t run applications directly on the cluster headnode. Always remember to switch to node 63 or 64 first, then run your applications, such as Matlab, IDL etc. • Submit your job via PBS on both Columbia and Andrei.
What is PBS? • Portable Batch System (or simply PBS) is the name of computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources. • If you want to know more about PBS, please contact Dr. Jean Wang.
What is parallelism? Less fish vs. more fish! Parallel programming Basics
What is Parallelism? • Parallelism is the use of multiple processors to solve a problem, and in particular, the use of multiple processors working concurrently on different parts of a problem. • The different parts could be different tasks, or the same task on different pieces of the problem’s data.
Kinds of Parallelism • Shared Memory: Auto Parallel, OpenMP, MPI • Distributed Memory – MPI
Serial Computing: Suppose you want to do a jigsaw puzzle that has 1000 pieces. Let’s say that you can put the puzzle together in an hour. Shared Memory Parallelism: If Tom sits across the table from you, then he can work on his half of the puzzle and you can work on yours. The Jigsaw Puzzle Analogy
Shared Memory Parallelism • Once in a while, you will both reach into the pile of pieces at the same time (you will contend for the same resource), which will cause a little bit of slowdown. • And from time to time you will have to work together (communicate) at the interface between his half and yours. • The speedup will be nearly 2-to-1: you will take 35 min instead of 30 min.
The More the Merrier? • Now let’s put Mike and Sam on the other two sides of the table. Each of you can work on a part of the puzzle, but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaces. • So you will get noticeably less than a 4-to-1 speedup, but you will still have an improvement, say the four of you can get it done in 20 min instead of an hour.
If we now put four more people on the corners of the table, there is going to be a lot contention for the shared resource, and a lot of communication at the many interfaces. You will be lucky to get it down in 15 min. Adding too many workers onto a shared resource is eventually going to have a diminishing return. Diminishing Returns
Now let’s set up two tables, and let’s put you at one of them and Tom at the other. Let’s put half of the puzzle pieces on your table and the other half of the pieces on Tom’s. Now you all can work completely independently, without any contention for a shared resource. But the cost of communicating is much higher (scootch tables together), and you need the ability to split up (decompose) the puzzle pieces reasonably evenly. Distributed Parallelism
Distributed Parallelism • Processors are independent of each other. • All data are private. • Processes communicate by passing messages. • The cost of passing a message is split into the latency (connection time) and the bandwidth (time per byte).
Parallel Overhead • Parallelism isn’t free. The compiler and the hardware have to do a lot of work parallelism happen – and this work takes time. This time is called parallel overhead. • The overhead typically includes: • Managing the multiple processes; • Communication between processes; • Synchronization: everyone stops until everyone is ready.
OpenMP and MPI programming paradigms • MPI… parallelizing data • OpenMP… parallelizing tasks
Harry Potter Volume 1 Spanish French Translator Harry Potter Volume 2 Spanish French Translator MPI
Harry Potter Volume 1 Harry Potter Volume 2 Spanish Translator Harry Potter Volume 1 Harry Potter Volume 2 French Translator OpenMP
Compilers Compilers on ACT cluster (andrei): • GNU – C/C++, g77 • PGI – C/C++, f77, f90 Compilers on Altix 3000 (columbia): • Intel – C/C++, Fortran • GNU– C/C++, g77
PGI Compilers (cluster) PGI Compiler: • For 32-bit compilers, set PATH as: export PATH=/usr/local/pgi/linux86/6.0/bin:$PATH • For 64-bit compilers, set PATH as: export PATH=/usr/local/pgi/linux86-64/6.0/bin:$PATH Fortran: pgf77,pgf90,pgf95, pghpf(High Performance Fortran), mpif77,mpif90 C: pgcc,mpicc C++: pgCC, mpicxx
Compilers for MPI codes Examples: a C++ code bones.C, a C code bogey.c, and a Fortran code mpihello.f: On cluster: /usr/local/pgi/linux86/6.0/bin/mpicxx bones.C -o bones –lmpich On cloumbia: /opt/intel/cc/9.0/bin/icc bogey.c –o bogey -lmpi /opt/intel/fc/9.0/bin/ifort -o mpihello mpihello.f -lmpi
Compilers for MPI codes /usr/local/pgi/linux86/6.0/bin/mpicxx bones.C -o bones –lmpich pgif77 -o mpihello mpihello.f -lfmpich –lmpich mpif77 -o mpihello mpihello.f -lfmpich –lmpich Which mpirun • [pg-hpc-clnode-head ~]> which mpirun • /usr/local/pgi/linux86-64/6.0/bin/mpirun • [pg-hpc-altix-01 ~]> which mpirun • /usr/bin/mpirun • /opt/mpich/ch-p4/bin/mpirun –np 4 … More then one “mpirun” – SGI MPI and MPICH
Intel Compilers How to compile a parallel code MPI codes: • ifort -options myMPIcode.f -lmpi • icc -options myMPIcode.c -lmpi Code with OpenMp directives: • ifort -options -openmp myOpenMpcode.f • icc -options -openmp myOpenMpcode.c Automatic Parallelization: • ifort -parallel mycode.f • icc -parallel mycode.c
More About Compilers On columbia: • man ifort -M /opt/intel/fc/9.0/man • man icc -M /opt/intel/cc/9.0/man On andrei: • man pgCC -M /usr/local/pgi/linux86/6.0/man • man pgf90 -M /usr/local/pgi/linux86/6.0/man
Getting started with OpenMP • Key points • Shared memory multiprocessor nodes • Parallel programming using compiler directives • Fortran 77/90/95 and C/C++
C OpenMP compiler directive • Parallel regions in C … ============================== #include <stdio.h> int main (void) { #pragma omp parallel { printf ("Hello, world\n"); } return 0; } ================================
Fortran OpenMP compiler directive Parallel regions in Fortran … program hello c$omp parallel print*, ‘Hello, world’ c$omp end parallel end
Compiling and Running • Intel (-openmp) or SGI (-mp) • “icc test.cpp –openmp –o test-openmp.exe” • “ifort test.f –openmp –o test-openmp.exe” • “OMP_NUM_THREADS=32” • “Export OMP_NUM_THREADS” • “time ./test-openmp.exe”