1 / 30

Scalability Study of S3D using TAU

Scalability Study of S3D using TAU. Sameer Shende tau-team@cs.uoregon.edu. Acknowledgements. Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL] The performance data presented here is available at:

Download Presentation

Scalability Study of S3D using TAU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu

  2. Acknowledgements • Alan Morris [UO] • Kevin Huck [UO] • Allen D. Malony [UO] • Kenneth Roche [ORNL] • Bronis R. de Supinski [LLNL] The performance data presented here is available at: http://www.cs.uoregon.edu/research/tau/s3d

  3. TAU Parallel Performance System • http://www.cs.uoregon.edu/research/tau/ • Multi-level performance instrumentation • Multi-language automatic source instrumentation • Flexible and configurable performance measurement • Widely-ported parallel performance profiling system • Computer system architectures and operating systems • Different programming languages and compilers • Support for multiple parallel programming paradigms • Multi-threading, message passing, mixed-mode, hybrid

  4. Scalability Study • Harness testcase • Platform: Jaguar Cray XT3 at ORNL • 1p • 8p • 64p • 512p • Goal: to evaluate scaling properties of code regions • Scalability of MPI operations

  5. Introduction to ParaProf: Main Window click left mouse button click right mouse button load all 1p, 8p, 64p, 512p profile datasets together % paraprof *.ppk

  6. ParaProf: MFLOPs sorted by Exclusive Time

  7. Source Code View

  8. Comparison Window: Inclusive Time

  9. Comparing Level 1 Data Cache Misses

  10. CPU Resource Stalls

  11. ParaProf: 3D view for 512 cpus - Jagged Edges!

  12. MPI_Wait - Jagged Edges Seen in 3D Window pattern repeatsevery 8 cpus! 512 cpus

  13. MPI_Wait - Histogram (Bins) View

  14. Comparing MPI_Wait • MPI_Wait time increases steadily with processors!

  15. PerfDMF: Performance Data Mgmt. Framework

  16. PerfExplorer - Comparative Analysis • Relative speedup, efficiency • total runtime, by event, one event, by phase • Breakdown of total runtime • Group fraction of total runtime • Correlating events to total runtime • Timesteps per second

  17. PerfExplorer TAU’s PerfDMF database S3D

  18. PerfExplorer: Select Experiment & Analysis

  19. Relative Efficiency By Event

  20. Relative Efficiency For S3D - Weak Scaling

  21. Relative Speedup

  22. Relative Efficiency & Speedup for One Event

  23. Data Mining: Event Correlation to Total Time r = 1 implies direct correlation

  24. MPI Scaling

  25. Total Runtime Breakdown by Events

  26. S3D - Building with TAU • Change name of compiler in build/make.XT3 • ftn=> tau_f90.sh • cc => tau_cc.sh • Set compile time environment variables • setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt3/lib/ Makefile.tau-callpath-multiplecounters-mpi-papi-pdt-pgi • Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation • setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau -optPreProcess’ • Selective instrumentation file eliminates instrumentation in lightweight routines • Pre-process Fortran source code using cpp before compiling • Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script: • export TAU_THROTTLE=1 • export COUNTER1 GET_TIME_OF_DAY • export COUNTER2 PAPI_FP_INS • export COUNTER3 PAPI_L1_DCM • export COUNTER4 PAPI_RES_STL • export COUNTER5 PAPI_L2_DCM

  27. Selective Instrumentation in TAU % cat select.tau BEGIN_EXCLUDE_LIST MCADIF GETRATES TRANSPORT_M::MCAVIS_NEW MCEDIF MCACON CKYTCP THERMCHEM_M::MIXCP THERMCHEM_M::MIXENTH THERMCHEM_M::GIBBSENRG_ALL_DIMT CKRHOY MCEVAL4 THERMCHEM_M::HIS THERMCHEM_M::CPS THERMCHEM_M::ENTROPY END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION loops routine="#" END_INSTRUMENT_SECTION

  28. Getting Access to TAU on Jaguar • set path=(/spin/proj/perc/TOOLS/tau_latest/x86_64/bin $path) • Choose Stub Makefiles (TAU_MAKEFILE env. var.) from /spin/proj/perc/TOOLS/tau_latest/xt3/lib/Makefile.* • Makefile.tau-mpi-pdt-pgi (flat profile) • Makefile.tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) • Makefile.tau-callpath-mpi-pdt-pgi (single metric, callpath profile) • Binaries of S3D can be found in: • ~sameer/scratch/S3D-BINARIES • withtau • papi, multiplecounters, mpi, pdt, pgi options • without_tau

  29. Concluding Discussion • Performance tools must be used effectively • More intelligent performance systems for productive use • Evolve to application-specific performance technology • Deal with scale by “full range” performance exploration • Autonomic and integrated tools • Knowledge-based and knowledge-driven process • Performance observation methods do not necessarily need to change in a fundamental sense • More automatically controlled and efficiently use • Develop next-generation tools and deliver to community • Open source with support by ParaTools, Inc. • http://www.cs.uoregon.edu/research/tau

  30. Support Acknowledgements • Department of Energy (DOE) • Office of Science • LLNL, LANL, ORNL, ASC • PERI

More Related