OGO 2.1 SGI Origin 2000

OGO 2.1SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001

unite.sara.nl • SGI Origin 2000 • Located at SARA in Amsterdam • Hardware configuration : • 128 MIPS R10000 CPUs @ 250 Mhz • 64 Gbyte main memory • 1 Tbyte disk storage • 11 ethernet @ 100 Mbits • 1 ethernet @ 1 Gbit

Contents • Architecture • Overview • Module interconnect • Memory hierarchies • Programming • Parallel models • Data placement • Pros and cons

Overview - Features • 64 bit RISC microprocessors • Large main memory • “Scalable” in CPU, memory and I/O • Shared memory programming model

Overview - Applications • Worldwide : +/- 30.000 systems • ~ 50 with >128 CPUs • ~ 100 with 64-128 CPUs • ~ 500 with 32-64 CPUs • Computing serving : many CPUs and memory • Database serving : many disks • Web serving : many I/O

System architecture – 1 CPU • CPU + cache • One system bus • Memory • I/O (network + disk) • Cached data

System architecture – N CPU • Symmetric multi-processing (SMP) • Multi-CPU + caches • One shared bus • Memory • I/O

N CPU – cache coherency • Problem: • Inconsistent cached data • Solution: • Snooping • Broadcasting • Not scalable

Architecture – Origin 2000 • Node board • 2 CPU + cache • Memory • Directory • HUB • I/O

Origin 2000 Interconnect • Node boards • Routers • Six ports

Interconnect Topology

Sample Topologies

128 Topology

Virtual Memory • One CPU, multi programs • Page • Paging disk • Page replacement

O2000 Virtual Memory • Multi CPU, Multi progs • Non-Uniform Memory Access • Efficient programs: • Minimize data movement • Data “close” to CPU

Latencies and Bandwidth

Application performance • Scientific computing • LU, ocean, barnes, radiosity • Linear speedup • More CPUs -> performance

Programming support • IRIX operating system • Parallel programming • C source level with compiler pragmas • Posix Threads • UNIX processes • Data placement • dplace , dlock, dperf • Profiling • timex, ssrun

Parallel Programs • Functional Decomposition • Decompose the problem into different tasks • Domain Decomposition • Partition the problem’s data structure • Consider • Mapping tasks/parts onto CPUs • Coordinate work and communication of CPUs

Task Decomposition • Decompose problem • Determine dependencies

Task Decomposition • Map tasks on threads • Compare: • Sequential case • Parallel case

Efficient programs • Use many CPUs • Measure speedups • Avoid: • Excessive data dependencies • Excessive cache misses • Excessive inter-node communication

Multi-processor (128 ) Large memory (64 Gbyte) Shared memory programming Slow integer CPU Performance penalty: Data dependencies Off board memory Pros vs Cons

OGO 2.1 SGI Origin 2000

OGO 2.1 SGI Origin 2000

Presentation Transcript

Scalable CC-NUMA Design Study - SGI Origin 2000

SmartGridIreland (SGI)

Series’ ogo

Gorilla Ogo

Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000

Performance Optimization for the Origin 2000

SGI Video Servers

Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Performance Optimization for the Origin 2000

2.1 第 2000 個 Fibonacci number

SGI

SGI Origin 3000

OGO 2.1 SGI Origin 2000

Scalable CC-NUMA Design Study - SGI Origin 2000

Performance Optimization for the Origin 2000