40 likes | 159 Views
TAU Performance System Architecture. Program Database Toolkit (PDT). TAU Performance System Status. Platforms : IBM SP3, SGI Origin 2K/3K, Compaq Alpha Cluster, Cray T3E / SV-1, Linux x86 / IA-64 clusters, Hewlett-Packard Superdome/V-class, Hitachi SR8000, NEC SX-5
E N D
TAU Performance System Architecture Program Database Toolkit (PDT) TAU Performance System Status Platforms: IBM SP3, SGI Origin 2K/3K, Compaq Alpha Cluster, Cray T3E / SV-1, Linux x86 / IA-64 clusters, Hewlett-Packard Superdome/V-class, Hitachi SR8000, NEC SX-5 Languages: C, C++, Fortran 77, F90, OpenMP, Java, Python Thread libraries: pthreads, SGI sproc, OpenMP, Java, Windows Communications libraries: MPI, PVM, SHMEM Parallelism paradigms: shared memory multi-threading, distributed memory message passing, mixed-mode Performance technologies: Dyninst dynamic instrumentation, PAPI and PCL hardware counter libraries, Opari automatic OpenMP instrumentation, EPILOG tracing library, EXPERT trace analyzer, Vampir trace visualization, Paraver trace visualization Performance Mapping in Uintah Paraver TAU Applications Work packet computation events colored by task type SAMRAI (LLNL, Andy Wissink). TAU is being integrated in SAMRAI framework as the main performance measurement facility, replacing hand-instrumented counters and timers. TAU is included as part of the SAMRAI distribution. OVERTURE (LLNL, Brian Miller). TAU is being used in the Overture framework for object-oriented performance measurement. Miller has also integrated TAU in the adaptive-mesh refinement simulator, AMRSim, and is using TAU for another (undisclosed) software development project. ALPS(LLNL, James Schek). The Adaptive Laser Plasma Simulator uses SAMRAI as a software component. It gains access to TAU’s measurement support in SAMRAI as well as uses TAU directly in other ALPS code. C-SAFE (ASCI/ASAP; Utah, Chris Johnson and Steve Parker). TAU is being integrated in the Uintah Computation Framework (UCF) for use in performance measurement and analysis of C-SAFE applications. This work also includes new performance regression analysis and results reporting tools for cross-version and cross-experiment performance evaluation. PERC (DOE SciDAC; PERC consortium). TAU has been installed at PERC sites and is being applied in performance analysis studies of SciDAC applications, including the EVH11 astrophysics code. Oregon is an affiliate member of PERC. VTF (ASCI/ASAP; Caltech, Julian Cummings and Michael Aivazis). TAU is being integrated in the Virtual Test Facility (VTF) infrastructure because of its sole ability to portably measure performance across HPC systems with dynamically-loaded libraries and Python-controlled execution. OpenMP (ASCI Path Forward; Intel/KAI, Bob Kuhn; Pallas, Hans-Christian Hoppe). A performance monitoring interface for OpenMP is being defined and will be submitted to the OpenMP Architecture Review Board for standardization. A prototype has been developed and demonstrated with the TAU performance system and the EXPERT analysis tool. SAGE (LANL, Jack Horner). TAU has been applied to performance scalability analysis of the SAGE application. TAU is the only performance system that reliably provided a portable parallel profiling capability across all ASCI platforms. POOMA II(LANL / Code Sourcery LLC, Jeffrey Oldham and Mark Mitchel). TAU has been used as the primary tool for performance analysis throughout the POOMA project, most recently in the POOMA 2 development. TAU is the only tool able to instrument and measure the complex use of C++ templates and expression template parallelization in the POOMA framework. EPILOG • Utah ASCI/ASAP Center for Simulation of Accidental Fires and Explosions • Uintah Computational Framework (UCF) • Taskgraph macro dataflow • Performance mapping based on task semantics TAU Integration in SAMRAI • Structured Adaptive Mesh Refinement Application Infrastructure • Group-based SAMRAI performance timers and counts • Seamless integration in routine and MPI performance measurement Distinct phases of computation can be identifed based on task Application / Library Performance Profiling of EVH1 C / C++ parser Fortran 77/90 parser Program documentation PDBhtml • Enhanced Virginia Hydrodynamics #1 benchmark • IBM SP3, 16 processors • jRacy parallel profile display tools Application component glue IL IL SILOON TAU Performance Database Framework Project Goals and Recent Accomplishments Mixed-Mode Performance Analysis (OpenMP + MPI) C / C++ IL analyzer Fortran 77/90 IL analyzer • Dynamic performance measurement control • Dynamic event grouping • Multiple configurable counters • Selective instrumentation • Application-Level Performance Access • Incremental profile dumping • Runtime profile access • Multi-Level Performance Instrumentation and Mapping • Optimized instrumentation • Tracing library enhancement • Performance mapping profile display • System and Hardware Performance Integration • Callback registration for system and hardware counters • Trace recording of counts C++ / F90 interoperability CHASM • 2-D Stommel model of ocean circulation • Jacobi iteration,5-point stencil • Integrated OpenMP and MPI events • Uses OpenMP performance tools interface • Automatic instrumentation with Opari Raw performance data Performance analysis programs Program Database Files Automatic source instrumentation Threadmessagepairing DUCTAPE TAU_instr Performance analysis and query toolkit PerfDML data description IntegratedOpenMP +MPI events PerfDML translators ORDB PostgreSQL • XML profile data representation • Multiple experiment performance database Dr. Allen D. Malony, PI Dr. Sameer Shende, Post-doc Robert Bell, Research associate Kai Li, Ph.D. student Li Li, Ph.D. student Computational Science Institute Department Computer & Information Science University of Oregon DOE Grant No. DE FG03-01ER25501 . . . Performance Technology for Tera-Class Parallel Computers: Evolution of the TAU Performance System
Java program mpiJava package TAU package Thread API JNI MPI profiling interface Event notification TAU TAU wrapper Native MPI library JVMPI Profile DB TAU Performance Database Framework Raw performance data Performance analysis programs PerfDML data description Performance analysis and query toolkit PerfDML translators ORDB PostgreSQL . . . Paraver Application / Library Raw performance data Performance analysis programs EPILOG Performance results PerfDML data description C / C++ parser Fortran 77/90 parser Program documentation PDBhtml Performance analysis and query toolkit Application component glue IL IL SILOON PerfDML translators C / C++ IL analyzer Fortran 77/90 IL analyzer ORDB Work packet computation events colored by task type C++ / F90 interoperability CHASM PostgreSQL 8 processes . . . Program Database Files Automatic source instrumentation DUCTAPE TAU_instr Mapped task performance across processes 32 processes 32 processes Performance mapping for different tasks Distinct phases of computation can be identifed based on task 32 processes 8 processes Performance Technology for Tera-Class Parallel Computers: Evolution of the TAU Performance System • System and Hardware Performance Integration • Callback registration for system and hardware counters • Trace recording of counts • Multi-Level Performance Instrumentation and Mapping • Optimized instrumentation • Tracing library enhancement • Performance mapping profile display Allen D. Malony, PI Sameer Shende, Research Associate Robert Bell, Research Associate DOE Grant No. DE FG03-01ER25501
VTF SAMRAI