130 likes | 167 Views
Supercomputers 2. With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters. Classification of Computers. PVP (Cray T90). UMA Central Memory. SMP (Intel SHV, SUN E10000, DEC 8400 SGI Power Challenge, IBM R60, etc.). Multiprocessors Single Address space
E N D
Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters
Classification of Computers PVP (Cray T90) UMA Central Memory SMP (Intel SHV, SUN E10000, DEC 8400 SGI Power Challenge, IBM R60, etc.) Multiprocessors Single Address space Shared Memory COMA (KSR-1, DDM) CC-NUMA (SGI Origin2000, SN1 (SGI3000), Cray T3E, HP Exemplar, Sequent NUMA-Q, Data General) NUMA distributed memory NCC-NUMA (Cray T3D, IBM SP3) MIMD Cluster (IBM SP2, DEC TruCluster, Microsoft Wolfpack, “Beowolf”, etc.) loosely coupled, multiple OS NORMA no-remote memory access Multicomputers Multiple Address spaces “MPP” (Intel TFLOPS,TM-5) tightly coupled & single OS MIMD Multiple Instruction s Multiple Data PVP Parallel Vector Processor UMA Uniform Memory Access SMP Symmetric Multi-Processor NUMA Non-Uniform Memory Access COMA Cache Only Memory Architecture NORMA No-Remote Memory Access CC-NUMA Cache-Coherent NUMA MPP Massively Parallel Processor NCC-NUMA Non-Cache Coherent NUMA
Processor Processor Processor Cache Cache Cache Central Bus Main Memory I/O Main Memory I/O Main Memory I/O Main Memory I/O Structure of an SMP System (1) • Does NOT scale due to Bus-saturation • Bus is a very complex Component • High Memory-Latency due to the Complexity
Processor Processor Processor Cache Cache Cache Main Memory I/O Main Memory I/O Main Memory I/O Main Memory I/O Structure of an SMP System (2) • Scales very well • Crossbar is a very complex Component • High Memory-Latency due to the Complexity Central Crossbar
^Nodeboard ^Nodeboard N N N N N N N N I/O I/O N N N N N N N N SGI NUMA hypercube R R R R Global Switch Interconnect R R R R Structure of an SMP System (3)Origin SGI NUMA Architecture
Etc... Multi-rack (4 Modules) Rack (2 Modules) Deskside (Module) ..128 CPUs 32 CPUs 16CPUs 2-8 CPUs Systems are built from Modules
IRIX 6.5 New High-End ProductsOrigin 3000 Servers – Onyx 3 Systems SGI Origin 3200 SGI Onyx 3200 SGI Origin 3800 SGI Onyx 3800 SGI Origin 3400 SGI Onyx 3400
Rack 1 Rack 2 Rack 3 Rack 4 1 2 3 4 C C C C C C C C C C C C R R R R C C C C R R R R C C C C C C C C C C C C C C C C SGI 3800 System (16-512p) 128P System Topology R-Brick 8-port router P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick R-Brick R-Brick R-Brick R-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick R-Brick R-Brick R-Brick R-Brick R-Brick C-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick C-Brick C-Brick C-Brick C-Brick C-Brick P, I, or, X-Brick P, I, or, X-Brick I-Brick P, I, or, X-Brick I-Brick C-Brick C-Brick C-Brick C-Brick C-Brick Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Power Bay Minimum (16p) System 128p System
ASCI Blue MountainLos Alamos National Laboratories • Origin 2000 with 3+ Tflops peak • 1+ Tflop Application Performance • 48 Systems with 128 CPUs each = 6144 CPUs • 1536 Gbyte Memory • 76 Tbyte Diskspace
Cache subsystem memory disk ~2-3 cy 1 1400 ~10 cy 0.1 1169 SN-MIPS Latency 1200 64reg 1067 Origin2000 Latency Speed of Access 1/clock 1000 836 0.01 759 759 800 Remote Latency (ns) 32KB (L1) 554 600 ~100 - 300 cy (NUMA) 585 8MB (L2) 485 343 400 435 335 335 285 200 235 ~4000 cy 175 175 0 ~1 - 100s GB 2p 4p 8p 16p 32p 64p 128p 256p 512p Device Capacity (size) Memory hierarchy
CPU Weather simulation Traditional big supercomputer Repository / archive Signal processing Web serving Storage Media streaming I/O Scale in Any and All Dimensions NUMAflex™Flexible Configuration