1 / 27

Benchmarking Working Group Session Agenda

Benchmarking Working Group Session Agenda. What Makes HPC Applications Challenging?. David Koester, Ph.D 11-13 January 2005 HPCS Productivity Team Meeting Marina Del Rey, CA. Outline. HPCS Benchmark Spectrum What Makes HPC Applications Challenging? Memory access patterns/locality

diane
Download Presentation

Benchmarking Working Group Session Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Benchmarking Working GroupSession Agenda

  2. What Makes HPC Applications Challenging? David Koester, Ph.D 11-13 January 2005HPCS Productivity Team MeetingMarina Del Rey, CA

  3. Outline • HPCS Benchmark Spectrum • What Makes HPC Applications Challenging? • Memory access patterns/locality • Processor characteristics • Concurrency • I/O characteristics • What new challenges will arise from Petascale/s+ applications? • Bottleneckology • Amdahl’s Law • Example: Random Stride Memory Access • Summary

  4. HPCS Benchmark Spectrum

  5. HPCS Benchmark Spectrum What MakesHPC Applications Challenging? • Full applications may bechallengingdue to • Killer Kernels • Global data layouts • Input/Output • Killer Kernelsare challenging because of many things that link directly toarchitecture • Identify bottlenecks bymapping applications to architectures

  6. What Makes HPC Applications Challenging? Killer Kernels Global Data Layouts • Memory access patterns/locality • Spatial and Temporal • Indirect addressing • Data dependencies • Processor characteristics • Processor throughput (Instructions per cycle) • Low arithmetic density • Floating point versus integer • Special features • GF(2) math • Popcount • Integer division • Concurrency • Ubiquitous for Petascale/s • Load balance • I/O characteristics • Bandwidth • Latency • File access patterns • File generation rates Killer Kernels Killer Kernels Global Data Layouts Input/Output

  7. Cray“Parallel Performance Killer” Kernels

  8. Killer KernelsPhil Colella —The Seven Dwarfs

  9. Mission Partner Applications Memory Access Patterns/Locality HPCS Challenge Points HPCchallenge Benchmarks • How do mission partner applications relate to HPCS spatial/temporal view of memory? • Kernels? • Full applications?

  10. Comparison of similar speed MIPS processors with and without GF(2) math Popcount Similar or better performance reported using Alpha processors (Jack Collins (NCIFCRF)) Codes Cray-supplied library The Portable Cray Bioinformatics Library by ARSC References http://www.cray.com/downloads/biolib.pdf http://cbl.sourceforge.net/ Processor CharacteristicsSpecial Features Algorithmic speedup of 120x

  11. Concurrency Insert Cluttered VAMPIR Plot here

  12. I/O Relative Data Latency‡ Note: 11 orders of magnitude relative differences! ‡Henry Newman (Instrumental)

  13. I/O Relative Data Bandwidth per CPU‡ Note: 5 orders of magnitude relative differences! ‡Henry Newman (Instrumental)

  14. StrawmanHPCS I/O Goals/Challenges • 1 Trillion files in a single file system • 32K file creates per second • 10K metadata operations per second • Needed for Checkpoint/Restart files • Streaming I/O at 30 GB/sec full duplex • Needed for data capture • Support for 30K nodes • Future file system need low latency communication An envelope on HPCS Mission Partner requirements

  15. HPCS Benchmark Spectrum Future and Emerging Applications • Identifying HPCS Mission Partner efforts • 10-20K processor — 10-100 Teraflop/s scale applications • 20-120K processor — 100-300 Teraflop/s scale applications • Petascale/s applications • Applications beyond Petascale/s • LACSI Workshop — The Path to Extreme Supercomputing • 12 October 2004 • http://www.zettaflops/org • What new challenges will arise from Petascale/s+ applications?

  16. Outline • HPCS Benchmark Spectrum • What Makes HPC Applications Challenging? • Memory access patterns/locality • Processor characteristics • Parallelism • I/O characteristics • What new challenges will arise from Petascale/s+ applications? • Bottleneckology • Amdahl’s Law • Example: Random Stride Memory Access • Summary

  17. Bottleneckology • Bottleneckology • Where is performance lost when an application is run on an architecture? • When does it make sense to invest in architecture to improve application performance? • System analysis driven by an extended Amdahl’s Law • Amdahl’s Law is not just about parallel and sequential parts of applications! • References: • Jack Worlton, "Project Bottleneck: A Proposed Toolkit for Evaluating Newly-Announced High Performance Computers", Worlton and Associates, Los Alamos, NM, Technical Report No.13,January 1988 • Montek Singh, “Lecture Notes — Computer Architecture and Implementation: COMP 206”, Dept. of Computer Science, Univ. of North Carolina at Chapel Hill, Aug 30, 2004www.cs.unc.edu/~montek/teaching/ fall-04/lectures/lecture-2.ppt

  18. Lecture Notes — Computer Architecture and Implementation (5)‡ ‡Montek Singh (UNC)

  19. Lecture Notes — Computer Architecture and Implementation (6)‡ ‡Montek Singh (UNC)

  20. Lecture Notes — Computer Architecture and Implementation (7)‡ Also works for Rate = Bandwidth! ‡Montek Singh (UNC)

  21. Lecture Notes — Computer Architecture and Implementation (8)‡ ‡Montek Singh (UNC)

  22. Combine stride 1 and random stride memory access 25% random stride access 33% random stride access Memory bandwidth performance is dominated by the random stride memory access Bottleneck Example (1) SDSC MAPS on an IBM SP-3

  23. Combine stride 1 and random stride memory access 25% random stride access 33% random stride access Memory bandwidth performance is dominated by the random stride memory access Bottleneck Example (2) SDSC MAPS on a COMPAQ Alphaserver [ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s Amdahl’s Law

  24. Combine stride 1 and random stride memory access 25% random stride access 33% random stride access Memory bandwidth performance is dominated by the random stride memory access Bottleneck Example (2) • Some HPCS Mission Partner applications • Extensive random stride memory access • Some random stride memory access • However, even a small amount of random memory access can cause significant bottlenecks! SDSC MAPS on a COMPAQ Alphaserver [ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s Amdahl’s Law

  25. Outline • HPCS Benchmark Spectrum • What Makes HPC Applications Challenging? • Memory access patterns/locality • Processor characteristics • Parallelism • I/O characteristics • What new challenges will arise from Petascale/s+ applications? • Bottleneckology • Amdahl’s Law • Example: Random Stride Memory Access • Summary

  26. Summary (1) What makes Applications Challenging! • Memory access patterns/locality • Spatial and Temporal • Indirect addressing • Data dependencies • Processor characteristics • Processor throughput (Instructions per cycle) • Low arithmetic density • Floating point versus integer • Special features • GF(2) math • Popcount • Integer division • Parallelism • Ubiquitous for Petascale/s • Load balance • I/O characteristics • Bandwidth • Latency • File access patterns • File generation rates • Expand this List as required • Work toward consensus with • HPCS Mission Partners • HPCS Vendors • Understand Bottlenecks • Characterize applications • Characterize architectures

  27. HPCS Benchmark Spectrum What MakesHPC Applications Challenging? • Full applications may bechallengingdue to • Killer Kernels • Global data layouts • Input/Output • Killer Kernelsare challenging because of many things that link directly toarchitecture • Identify bottlenecks bymapping applications to architectures Impress upon the HPCS community to identify what makes the application challenging when using an existing Mission Partner application for a systems analysis in the MS4 review

More Related