1 / 12

Supercomputers 2

Supercomputers 2. With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters. Classification of Computers. PVP (Cray T90). UMA Central Memory. SMP (Intel SHV, SUN E10000, DEC 8400 SGI Power Challenge, IBM R60, etc.). Multiprocessors Single Address space

Download Presentation

Supercomputers 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters

  2. Classification of Computers PVP (Cray T90) UMA Central Memory SMP (Intel SHV, SUN E10000, DEC 8400 SGI Power Challenge, IBM R60, etc.) Multiprocessors Single Address space Shared Memory COMA (KSR-1, DDM) CC-NUMA (SGI Origin2000, SN1 (SGI3000), Cray T3E, HP Exemplar, Sequent NUMA-Q, Data General) NUMA distributed memory NCC-NUMA (Cray T3D, IBM SP3) MIMD Cluster (IBM SP2, DEC TruCluster, Microsoft Wolfpack, “Beowolf”, etc.) loosely coupled, multiple OS NORMA no-remote memory access Multicomputers Multiple Address spaces “MPP” (Intel TFLOPS,TM-5) tightly coupled & single OS MIMD Multiple Instruction s Multiple Data PVP Parallel Vector Processor UMA Uniform Memory Access SMP Symmetric Multi-Processor NUMA Non-Uniform Memory Access COMA Cache Only Memory Architecture NORMA No-Remote Memory Access CC-NUMA Cache-Coherent NUMA MPP Massively Parallel Processor NCC-NUMA Non-Cache Coherent NUMA

  3. Design Space of Competing Computer Architecture

  4. Processor Processor Processor Cache Cache Cache Central Bus Main Memory I/O Main Memory I/O Main Memory I/O Main Memory I/O Structure of an SMP System (1) • Does NOT scale due to Bus-saturation • Bus is a very complex Component • High Memory-Latency due to the Complexity

  5. Processor Processor Processor Cache Cache Cache Main Memory I/O Main Memory I/O Main Memory I/O Main Memory I/O Structure of an SMP System (2) • Scales very well • Crossbar is a very complex Component • High Memory-Latency due to the Complexity Central Crossbar

  6. ^Nodeboard ^Nodeboard N N N N N N N N I/O I/O N N N N N N N N SGI NUMA hypercube R R R R Global Switch Interconnect R R R R Structure of an SMP System (3)Origin SGI NUMA Architecture

  7. Etc... Multi-rack (4 Modules) Rack (2 Modules) Deskside (Module) ..128 CPUs 32 CPUs 16CPUs 2-8 CPUs Systems are built from Modules

  8. IRIX 6.5 New High-End ProductsOrigin 3000 Servers – Onyx 3 Systems SGI Origin 3200 SGI Onyx 3200 SGI Origin 3800 SGI Onyx 3800 SGI Origin 3400 SGI Onyx 3400

  9. Rack 1 Rack 2 Rack 3 Rack 4 1 2 3 4 C C C C C C C C C C C C R R R R C C C C R R R R C C C C C C C C C C C C C C C C SGI 3800 System (16-512p) 128P System Topology R-Brick 8-port router P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick R-Brick R-Brick R-Brick R-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick R-Brick R-Brick R-Brick R-Brick R-Brick C-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick I-Brick P, I, or, X-Brick I-Brick C-Brick C-Brick C-Brick C-Brick C-Brick Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Minimum (16p) System 128p System

  10. ASCI Blue MountainLos Alamos National Laboratories • Origin 2000 with 3+ Tflops peak • 1+ Tflop Application Performance • 48 Systems with 128 CPUs each = 6144 CPUs • 1536 Gbyte Memory • 76 Tbyte Diskspace

  11. Cache subsystem memory disk ~2-3 cy 1 1400 ~10 cy 0.1 1169 SN-MIPS Latency 1200 64reg 1067 Origin2000 Latency Speed of Access 1/clock 1000 836 0.01 759 759 800 Remote Latency (ns) 32KB (L1) 554 600 ~100 - 300 cy (NUMA) 585 8MB (L2) 485 343 400 435 335 335 285 200 235 ~4000 cy 175 175 0 ~1 - 100s GB 2p 4p 8p 16p 32p 64p 128p 256p 512p Device Capacity (size) Memory hierarchy

  12. CPU Weather simulation Traditional big supercomputer Repository / archive Signal processing Web serving Storage Media streaming I/O Scale in Any and All Dimensions NUMAflex™Flexible Configuration

More Related