270 likes | 447 Views
Benchmarking Working Group Session Agenda. What Makes HPC Applications Challenging?. David Koester, Ph.D 11-13 January 2005 HPCS Productivity Team Meeting Marina Del Rey, CA. Outline. HPCS Benchmark Spectrum What Makes HPC Applications Challenging? Memory access patterns/locality
E N D
What Makes HPC Applications Challenging? David Koester, Ph.D 11-13 January 2005HPCS Productivity Team MeetingMarina Del Rey, CA
Outline • HPCS Benchmark Spectrum • What Makes HPC Applications Challenging? • Memory access patterns/locality • Processor characteristics • Concurrency • I/O characteristics • What new challenges will arise from Petascale/s+ applications? • Bottleneckology • Amdahl’s Law • Example: Random Stride Memory Access • Summary
HPCS Benchmark Spectrum What MakesHPC Applications Challenging? • Full applications may bechallengingdue to • Killer Kernels • Global data layouts • Input/Output • Killer Kernelsare challenging because of many things that link directly toarchitecture • Identify bottlenecks bymapping applications to architectures
What Makes HPC Applications Challenging? Killer Kernels Global Data Layouts • Memory access patterns/locality • Spatial and Temporal • Indirect addressing • Data dependencies • Processor characteristics • Processor throughput (Instructions per cycle) • Low arithmetic density • Floating point versus integer • Special features • GF(2) math • Popcount • Integer division • Concurrency • Ubiquitous for Petascale/s • Load balance • I/O characteristics • Bandwidth • Latency • File access patterns • File generation rates Killer Kernels Killer Kernels Global Data Layouts Input/Output
Mission Partner Applications Memory Access Patterns/Locality HPCS Challenge Points HPCchallenge Benchmarks • How do mission partner applications relate to HPCS spatial/temporal view of memory? • Kernels? • Full applications?
Comparison of similar speed MIPS processors with and without GF(2) math Popcount Similar or better performance reported using Alpha processors (Jack Collins (NCIFCRF)) Codes Cray-supplied library The Portable Cray Bioinformatics Library by ARSC References http://www.cray.com/downloads/biolib.pdf http://cbl.sourceforge.net/ Processor CharacteristicsSpecial Features Algorithmic speedup of 120x
Concurrency Insert Cluttered VAMPIR Plot here
I/O Relative Data Latency‡ Note: 11 orders of magnitude relative differences! ‡Henry Newman (Instrumental)
I/O Relative Data Bandwidth per CPU‡ Note: 5 orders of magnitude relative differences! ‡Henry Newman (Instrumental)
StrawmanHPCS I/O Goals/Challenges • 1 Trillion files in a single file system • 32K file creates per second • 10K metadata operations per second • Needed for Checkpoint/Restart files • Streaming I/O at 30 GB/sec full duplex • Needed for data capture • Support for 30K nodes • Future file system need low latency communication An envelope on HPCS Mission Partner requirements
HPCS Benchmark Spectrum Future and Emerging Applications • Identifying HPCS Mission Partner efforts • 10-20K processor — 10-100 Teraflop/s scale applications • 20-120K processor — 100-300 Teraflop/s scale applications • Petascale/s applications • Applications beyond Petascale/s • LACSI Workshop — The Path to Extreme Supercomputing • 12 October 2004 • http://www.zettaflops/org • What new challenges will arise from Petascale/s+ applications?
Outline • HPCS Benchmark Spectrum • What Makes HPC Applications Challenging? • Memory access patterns/locality • Processor characteristics • Parallelism • I/O characteristics • What new challenges will arise from Petascale/s+ applications? • Bottleneckology • Amdahl’s Law • Example: Random Stride Memory Access • Summary
Bottleneckology • Bottleneckology • Where is performance lost when an application is run on an architecture? • When does it make sense to invest in architecture to improve application performance? • System analysis driven by an extended Amdahl’s Law • Amdahl’s Law is not just about parallel and sequential parts of applications! • References: • Jack Worlton, "Project Bottleneck: A Proposed Toolkit for Evaluating Newly-Announced High Performance Computers", Worlton and Associates, Los Alamos, NM, Technical Report No.13,January 1988 • Montek Singh, “Lecture Notes — Computer Architecture and Implementation: COMP 206”, Dept. of Computer Science, Univ. of North Carolina at Chapel Hill, Aug 30, 2004www.cs.unc.edu/~montek/teaching/ fall-04/lectures/lecture-2.ppt
Lecture Notes — Computer Architecture and Implementation (5)‡ ‡Montek Singh (UNC)
Lecture Notes — Computer Architecture and Implementation (6)‡ ‡Montek Singh (UNC)
Lecture Notes — Computer Architecture and Implementation (7)‡ Also works for Rate = Bandwidth! ‡Montek Singh (UNC)
Lecture Notes — Computer Architecture and Implementation (8)‡ ‡Montek Singh (UNC)
Combine stride 1 and random stride memory access 25% random stride access 33% random stride access Memory bandwidth performance is dominated by the random stride memory access Bottleneck Example (1) SDSC MAPS on an IBM SP-3
Combine stride 1 and random stride memory access 25% random stride access 33% random stride access Memory bandwidth performance is dominated by the random stride memory access Bottleneck Example (2) SDSC MAPS on a COMPAQ Alphaserver [ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s Amdahl’s Law
Combine stride 1 and random stride memory access 25% random stride access 33% random stride access Memory bandwidth performance is dominated by the random stride memory access Bottleneck Example (2) • Some HPCS Mission Partner applications • Extensive random stride memory access • Some random stride memory access • However, even a small amount of random memory access can cause significant bottlenecks! SDSC MAPS on a COMPAQ Alphaserver [ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s Amdahl’s Law
Outline • HPCS Benchmark Spectrum • What Makes HPC Applications Challenging? • Memory access patterns/locality • Processor characteristics • Parallelism • I/O characteristics • What new challenges will arise from Petascale/s+ applications? • Bottleneckology • Amdahl’s Law • Example: Random Stride Memory Access • Summary
Summary (1) What makes Applications Challenging! • Memory access patterns/locality • Spatial and Temporal • Indirect addressing • Data dependencies • Processor characteristics • Processor throughput (Instructions per cycle) • Low arithmetic density • Floating point versus integer • Special features • GF(2) math • Popcount • Integer division • Parallelism • Ubiquitous for Petascale/s • Load balance • I/O characteristics • Bandwidth • Latency • File access patterns • File generation rates • Expand this List as required • Work toward consensus with • HPCS Mission Partners • HPCS Vendors • Understand Bottlenecks • Characterize applications • Characterize architectures
HPCS Benchmark Spectrum What MakesHPC Applications Challenging? • Full applications may bechallengingdue to • Killer Kernels • Global data layouts • Input/Output • Killer Kernelsare challenging because of many things that link directly toarchitecture • Identify bottlenecks bymapping applications to architectures Impress upon the HPCS community to identify what makes the application challenging when using an existing Mission Partner application for a systems analysis in the MS4 review