1 / 73

Introduction to High Performance Computing

Introduction to High Performance Computing. Jon Johansson Academic ICT University of Alberta. Agenda. What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing

mabli
Download Presentation

Introduction to High Performance Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to High Performance Computing Jon Johansson Academic ICT University of Alberta

  2. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  3. High Performance Computing • HPC is the field that concentrates on developing supercomputers and software to run on supercomputers • a main area of this discipline is developing parallel processing algorithms and software • programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors

  4. High Performance Computing • HPC is about “big problems”, i.e. need: • lots of memory • many cpu cycles • big hard drives • no matter what field you work in, perhaps your research would benefit by making problems “larger” • 2d → 3d • finer mesh • increase number of elements in the simulation

  5. Grand Challenges • weather forecasting • economic modeling • computer-aided design • drug design • exploring the origins of the universe • searching for extra-terrestrial life • computer vision • nuclear power and weapons simulations

  6. Grand Challenges – Protein To simulate the folding of a 300 amino acid protein in water: # of atoms: ~ 32,000 folding time: 1 millisecond # of FLOPs: 3  1022 Machine Speed: 1 PetaFLOP/s Simulation Time: 1 year (Source: IBM Blue Gene Project) Ken Dil and Kit Lau’s protein folding model. IBM’s answer: The Blue Gene Project US$ 100 M of funding to build a 1 PetaFLOP/s computer Charles L Brooks III, Scripps Research Institute

  7. Grand Challenges - Nuclear • National Nuclear Security Administration • http://www.nnsa.doe.gov/ • use supercomputers to run three-dimensional codes to simulate instead of test • address critical problems of materials aging • simulate the environment of the weapon and try to gauge whether the device continues to be usable • stockpile science, molecular dynamics and turbulence calculations http://archive.greenpeace.org/comms/nukes/fig05.gif

  8. Grand Challenges - Nuclear ASCI White • March 7, 2002: first full-system three-dimensional simulations of a nuclear weapon explosion • simulation used more than 480 million cells (grid: 780x780x780) • if the grid is a cube • 1,920 processors on IBM ASCI White at the Lawrence Livermore National laboratory • 2,931 wall-clock hours or 122.5 days • 6.6 million CPU hours Test shot “Badger” Nevada Test Site – Apr. 1953 Yield: 23 kilotons http://nuclearweaponarchive.org/Usa/Tests/Upshotk.html

  9. Grand Challenges - Nuclear • Advanced Simulation and Computing Program (ASC) • http://www.llnl.gov/asc/asc_history/asci_mission.html

  10. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  11. What is a “Mainframe”? • large and reasonably fast machines • the speed isn't the most important characteristic • high-quality internal engineering and resulting proven reliability • expensive but high-quality technical support • top-notch security • strict backward compatibility for older software

  12. What is a “Mainframe”? • these machines can, and do, run successfully for years without interruption (long uptimes) • repairs can take place while the mainframe continues to run • the machines are robust and dependable • IBM coined a term advertise the robustness of their mainframe computers : • Reliability, Availability and Serviceability (RAS)

  13. What is a “Mainframe”? • Introducing IBM System z9 109 • Designed for the On Demand Business • IBM is delivering a holistic approach to systems design • Designed and optimized with a total systems approach • Helps keep your applications running with enhanced protection against planned and unplanned outages • Extended security capabilities for even greater protection capabilities • Increased capacity with more available engines per server

  14. What is a Supercomputer?? • at any point in time the term “Supercomputer” refers to the fastest machines currently available • a supercomputer this year might be a mainframe in a couple of years • a supercomputer is typically used for scientific and engineering applications that must do a great amount of computation

  15. What is a Supercomputer?? • the most significant difference between a supercomputer and a mainframe: • a supercomputer channels all its power into executing a few programs as fast as possible • if the system crashes, restart the job(s) – no great harm done • a mainframe uses its power to execute many programs simultaneously • e.g. – a banking system • must run reliably for extended periods

  16. What is a Supercomputer?? • to see the worlds “fastest” computers look at • http://www.top500.org/ • measure performance with the Linpack benchmark • http://www.top500.org/lists/linpack.php • solve a dense system of linear equations • the performance numbers give a good indication of peak performance

  17. What is a Supercomputer?? • count the number of “floating point operations” required to solve the problem • + - x / • results of the benchmark are so many Floating point Operations Per Second (FLOPS) • a supercomputer is a machine that can provide a very large number of FLOPS

  18. Floating Point Operations • multiply 2 1000x1000 matrices • for each resulting array element • 1000 multiplies • 999 adds • do this 1,000,000 times • ~109 operations needed • increasing array size has the number of operations increasing as O(N3)

  19. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  20. High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures have • processors and some combination cache • some form of memory and IO • the processors are separated from the other processors by some distance • there are major differences in the way that the parts are connected • some problems fit into different architectures better than others

  21. High Performance Computing • increasing computing power available to researchers allows • increasing problem dimensions • adding more particles to a system • increasing the accuracy of the result • improving experiment turnaround time

  22. Flynn’s Taxonomy • Michael J. Flynn (1972) • classified computer architectures based on the number of concurrent instructions and data streams available • single instruction, single data (SISD) – basic old PC • multiple instruction, single data (MISD) – redundant systems • single instruction, multiple data (SIMD) – vector (or array) processor • multiple instruction, multiple data (MIMD) – shared or distributed memory systems: symmetric multiprocessors and clusters • common extension: • single program (or process), multiple data (SPMD)

  23. Architectures • we can also classify supercomputers according to how the processors and memory are connected • couple processors to a single large memory address space • couple computers, each with its own memory address space

  24. Architectures • Symmetric Multiprocessing (SMP) • Uniform Memory Access (UMA) • multiple CPUs, residing in one cabinet, share the same memory • processors and memory are tightly coupled • the processors share memory and the I/O bus or data path

  25. Architectures • SMP • a single copy of the operating system is in charge of all the processors • SMP systems range from two to as many as 32 or more processors

  26. Architectures • SMP • "capability computing" • one CPU can use all the memory • all the CPUs can work on a little memory • whatever you need

  27. Architectures • UMA-SMP negatives • as the number of CPUs get large the buses become saturated • long wires cause latency problems

  28. Architectures • Non-Uniform Memory Access (NUMA) • NUMA is similar to SMP - multiple CPUs share a single memory space • hardware support for shared memory • memory is separated into close and distant banks • basically a cluster of SMPs • memory on the same processor board as the CPU (local memory) is accessed faster than memory on other processor boards (shared memory) • hence "non-uniform" • NUMA architecture scales much better to higher numbers of CPUs than SMP

  29. Architectures

  30. Architectures University of Alberta SGI Origin SGI NUMA cables

  31. Architectures • Cache Coherent NUMA (ccNUMA) • each CPU has an associated cache • ccNUMA machines use special-purpose hardware to maintain cache coherence • typically done by using inter-processor communication between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache • ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession

  32. Architectures Distributed Memory Multiprocessor (DMMP) • each computer has its own memory address space • looks like NUMA but there is no hardware support for remote memory access • the special purpose switched network is replaced by a general purpose network such as Ethernet or more specialized interconnects: • Infiniband • Myrinet Lattice: Calgary’s HP ES40 and ES45 cluster – each node has 4 processors

  33. Architectures • Massively Parallel Processing (MPP) Cluster of commodity PCs • processors and memory are loosely coupled • "capacity computing" • each CPU contains its own memory and copy of the operating system and application. • each subsystem communicates with the others via a high-speed interconnect. • in order to use MPP effectively, a problem must be breakable into pieces that can all be solved simultaneously

  34. Architectures

  35. Architectures • lots of “how to build a cluster” tutorials on the web – just Google: • http://www.beowulf.org/ • http://www.cacr.caltech.edu/beowulf/tutorial/building.html

  36. Architectures • Vector Processor or Array Processor • a CPU design that is able to run mathematical operations on multiple data elements simultaneously • a scalar processor operates on data elements one at a time • vector processors formed the basis of most supercomputers through the 1980s and into the 1990s • “pipeline” the data

  37. Architectures • Vector Processor or Array Processor • operate on many pieces of data simultaneously • consider the following add instruction: • C = A + B • on both scalar and vector machines this means: • add the contents of A to the contents of B and put the sum in C' • on a scalar machine the operands are numbers • on a vector machine the operands are vectors and the instruction directs the machine to compute the pair-wise sum of each pair of vector elements

  38. Architectures • University of Victoria has 4 NEC SX-6/8A vector processors • in the School of Earth and Ocean Sciences • each has 32 GB of RAM • 8 vector processors in the box • peak performance is 72 GFLOPS

  39. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  40. BlueGene/L • The fastest on the 26th (Nov. 2006) top 500 list: • http://www.top500.org/ • installed at the Lawrence Livermore National Laboratory (LLNL) (US Department of Energy) • Livermore California

  41. http://www.llnl.gov/asc/platforms/bluegenel/photogallery.htmlhttp://www.llnl.gov/asc/platforms/bluegenel/photogallery.html

  42. BlueGene/L • processors: 131072 • memory: 32 TB • 64 racks – each has 2048 processors and 512 GB of RAM (256 MB/processor) • a Linpack performance of 280.6 TFlop/s • in Nov 2005 it was the only system ever to exceed the 100 TFlop/s mark • there are now 2 machines over 100 TFlop/s

  43. The Fastest Eight

  44. Future BlueGene

  45. Agenda • What is High Performance Computing? • What is a “supercomputer”? • is it a mainframe? • Supercomputer architectures • Who has the fastest computers? • Speedup • Programming for parallel computing • The GRID??

  46. Speedup • how can we measure how much faster our program runs when using more than one processor? • define Speedup S as: • the ratio of 2 program execution times • constant problem size • T1 is the execution time for the problem on a single processor (use the “best” serial time) • TP is the execution time for the problem on P processors

  47. Speedup • Linear speedup • the time to execute the problem decreases by the number of processors • if a job requires 1 week with 1 processor it will take less that 10 minutes with 1024 processors

  48. Speedup • Sublinear speedup • the usual case • there are generally some limitations to the amount of speedup that you get • communication

  49. Speedup • Superlinear speedup • very rare • memory access patterns may allow this for some algorithms

  50. Speedup • why do a speedup test? • it’s hard to tell how a program will behave • e.g. • “Strange” is actually fairly common behaviour for un-tuned code • in this case: • linear speedup to ~10 cpus • after 24 cpus speedup is starting to decrease

More Related