120 likes | 207 Views
Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005. Outline. An engineering level overview of the HW and SW that make up jacquard. CPU’s Memory OS Interconnect Will use seaborg as a point of reference. G. P. F. S. main memory. GPFS.
E N D
Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005
Outline An engineering level overview of the HW and SW that make up jacquard. • CPU’s • Memory • OS • Interconnect Will use seaborg as a point of reference.
G P F S main memory GPFS MPI seaborg.nersc.gov (review?) 16 way SMP NHII Node Seaborg: 380 x Colony Switch CSS0 CSS1 crossbar • 6080 dedicated CPUs, 96 shared login CPUs • Hierarchy of caching, speeds • Bottleneck determined by first depleted resource HPSS
G P F S Main Memory GPFS MPI jacquard.nersc.gov basics 2 way Opteron node Jacquard: 320 x Infiniband Switch HT IB • 640 dedicated CPUs, 8 shared login CPUs • Smaller caches, HT, Really Fast • SMP? NUMA? SUMO. HPSS
Opteron Block Diagram : Not strictly SMP SDRAM SDRAM Switch, I/O 1 TLB per CPU 1K entries 4K pages 4MB coverage
Hyper Transport: Good Stuff Little conflict between data movement and computation
SMP size and memory contention Why is Jacquard 2 way SMP? Jacquard’s numbers 1 task : 100 % 2 tasks: 98%
Flops @ 2.2 GHz • Peak Theoretical Flops • Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s • Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s • Peak Realized Flops • Double (64 bit) floats : 1.9 GFlop/s • Single (32 bit) floats : 3.4 GFlop/s • Your Flops? • Walltime is more important than flops • For a known algorithm flops are a sanity check Memory BW 4 GB/sec per CPU
Linux for AIX Users Linux and AIX are more similar than different • Linux is not as good as AIX in keeping processes scheduled of the same CPU processor affinity work. • Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc. • AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules • Linux doesn’t need –bmaxdata ! • Little vs. Big Endian
Conclusions • The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance. • Still working some SDRAMM, OS, and SW issues. • What’s useful to you? Let us know.