1 / 61

Performance Technology for Parallel Component Software

Performance Technology for Parallel Component Software. Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University of Oregon. Outline. Overview of the TAU project

Download Presentation

Performance Technology for Parallel Component Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Technologyfor Parallel Component Software Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University of Oregon

  2. Outline • Overview of the TAU project • Performance analysis of distributed Java programs • Profiling toolkit, PDT, Database, Monitor, XPARE • Performance Engineered Component Software • CCA Performance Observation Component • CCAFFEINE (Classic C++) • SIDL • Applications : • Optimizer Component • Combustion Component • Concluding remarks

  3. TAU Performance System Framework • Tuning and Analysis Utilities • Performance system framework for scalable parallel and distributed high-performance computing • Integrated toolkit for performance instrumentation, measurement, analysis, and visualization • Portable, configurable performance profiling/tracing facility • Open software approach • University of Oregon, LANL, FZJ Germany • http://www.cs.uoregon.edu/research/paracomp/tau

  4. General Complex System Computation Model • Node:physically distinct shared memory machine • Message passing node interconnection network • Context: distinct virtual memory space within node • Thread: execution threads (user/system) in context Interconnection Network Inter-node messagecommunication * * Node Node Node node memory memory memory SMP physicalview VM space … modelview … Context Threads

  5. TAU Performance System Architecture Paraver EPILOG

  6. TAU Instrumentation • Flexible instrumentation mechanisms at multiple levels • Source code • manual • automatic using Program Database Toolkit (PDT), OPARI • Object code • pre-instrumented libraries (e.g., MPI using PMPI) • statically linked • dynamically linked (e.g., Virtual machine instrumentation) • fast breakpoints (compiler generated) • Executable code • dynamic instrumentation (pre-execution) using DynInstAPI

  7. Virtual Machine Performance Instrumentation • Integrate performance system with VM • Captures robust performance data (e.g., thread events) • Maintain features of environment • portability, concurrency, extensibility, interoperation • Allow use in optimization methods • JVMProfiling Interface (JVMPI) • Generation of JVM events and hooks into JVM • Profiler agent (TAU) loaded as shared object • registers events of interest and address of callback routine • Access to information on dynamically loaded classes • No need to modify Java source, bytecode, or JVM

  8. TAU Profiling of Java Application (SciVis) 24 threads of execution! Profile for eachJava thread Captures eventsfor different Javapackages

  9. Mixed-mode Parallel Programs (Java + MPI) • Java threads and MPI communications • Shared-memory multi-threading events • Message communications events • Unified performance measurement and views • Integration of performance mechanisms • Integrated association of performance events • thread event and communication events • user-defined (source-level) performance events • JVM events • Support for performance measurement scaling • Support for performance data access

  10. Instrumentation and Measurement Cooperation • Problem • JVMPI doesn’t see MPI events (e.g., rank (node)) • MPI profiling interfaces doesn’t see threads • Source instrumentation doesn’t see either! • Need cooperation between interfaces • MPI exposes rank, gets thread information • JVMPI exposes thread information, get rank • Source instrumentation gets both • Post-mortem matching of sends and receives • Selective instrumentation • java -XrunTAU:exclude=java/io,sun

  11. Thread API TAU Java Instrumentation Architecture Java program mpiJava package TAU package JNI MPI profiling interface Event notification TAU TAU wrapper Native MPI library JVMPI Profile DB

  12. Java Source-Level Instrumentation • TAU Java package • User-defined events • TAU.Profile class for new “timers” • Start/Stop • Performance data output at end

  13. Parallel Java Game of Life (Profile) Merged Java and MPI event profiles • mpiJavatestcase • 4 nodes,28 threads Thread 4 executes all MPI routines Node 0 Node 1 Node 2

  14. Integrated event tracing Mergedtrace viz Nodeprocessgrouping Threadmessagepairing Vampirdisplay Multi-level event grouping Parallel Java Game of Life (Trace)

  15. TAU Status • Instrumentation supported: • Source, preprocessor, compiler, MPI, runtime, virtual machine • Languages supported: • C++, C, F90, Java, Python • HPF, ZPL, HPC++, pC++... • Packages supported: • PAPI [UTK], PCL [FZJ] (hardware performance counter access), • Opari, PDT [UO,LANL,FZJ], DyninstAPI [U.Maryland] (instrumentation), • EXPERT, EPILOG[FZJ],Vampir[Pallas], Paraver [CEPBA] (visualization) • Platforms supported: • IBM SP, SGI Origin, Sun, HP Superdome, HP-Compaq ES, • Linux clusters (IA-32, IA-64, PowerPC, Alpha), Apple OS X, Windows, • Hitachi SR8000, NEC SX, Cray T3E ... • Compilers suites supported: • GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM, Compaq,HP, Fujitsu, Hitachi, Sun, Apple, Microsoft, NEC, Cray, PGI, Absoft, … • Thread libraries supported: • Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS

  16. TAU Measurement System Configuration • configure [OPTIONS] • {-c++=<CC>, -cc=<cc>}Specify C++ and C compilers • {-pthread, -sproc} Use pthread or SGI sproc threads • -openmp Use OpenMP threads • -opari=<dir> Specify location of Opari OpenMP tool • -papi=<dir> Specify location of PAPI • -pdt=<dir> Specify location of PDT • -dyninst=<dir> Specify location of DynInst Package • {-mpiinc=<d>, mpilib=<d>} Specify MPI library instrumentation • -TRACE Generate TAU event traces • -PROFILE Generate TAU profiles • -MULTIPLECOUNTERS Use more than one hardware counter • -PROFILECALLPATH Use 1-level callpath profiles • -PAPIWALLCLOCK Use PAPI to access wallclock time • -PAPIVIRTUAL Use PAPI for virtual (user) time …

  17. TAU Measurement Configuration – Examples • ./configure -c++=xlC -cc=xlc –pdt=/usr/packages/pdtoolkit-2.1-pthread • Use TAU with IBM’s xlC compiler, PDT and the pthread library • Enable TAU profiling (default) • ./configure -TRACE –PROFILE • Enable both TAU profiling and tracing • ./configure -c++=guidec++ -cc=guidec -papi=/usr/local/packages/papi –openmp -mpiinc=/usr/packages/mpich/include -mpilib=/usr/packages/mpich/lib • Use OpenMP+MPI using KAI's Guide compiler suite and use PAPI for accessing hardware performance counters for measurements • Typically configure multiple measurement libraries

  18. Program Database Toolkit (PDT) • Program code analysis framework for developing source-based tools • High-level interface to source code information • Integrated toolkit for source code parsing, database creation, and database query • commercial grade front end parsers • portable IL analyzer, database format, and access API • open software approach for tool development • Target and integrate multiple source languages • Use in TAU to build automated performance instrumentation tools

  19. Application / Library C / C++ parser Fortran 77/90 parser Program documentation PDBhtml Application component glue IL IL SILOON C / C++ IL analyzer Fortran 77/90 IL analyzer C++ / F90 interoperability CHASM Program Database Files Automatic source instrumentation TAU_instr DUCTAPE Program Database Toolkit

  20. Program Database Toolkit (PDT) • Program code analysis framework for developing source-based tools for C99, C++ and F90 • High-level interface to source code information • Widely portable: • IBM, SGI, Compaq, HP, Sun, Linux clusters,Windows, Apple, Hitachi, Cray T3E... • Integrated toolkit for source code parsing, database creation, and database query • commercial grade front end parsers (EDG for C99/C++, Mutek for F90) • Intel/KAI C++ headers for std. C++ library distributed with PDT • portable IL analyzer, database format, and access API • open software approach for tool development • Target and integrate multiple source languages • Used in CCA for automated generation of SIDL • Used in TAU to build automated performance instrumentation tools (tau_instrumentor) • Used in CHASM, XMLGEN, Component method signature extraction,…

  21. Performance Steering TAU’s Runtime Monitor SCIRun Performance Visualizer Application // performance data streams TAU Performance System Performance Analyzer // performance data output Performance Data Integrator Performance Data Reader file system • sample sequencing • reader synchronization

  22. 2D Field Performance Visualization in SCIRun SCIRun program

  23. TAU’s Runtime Monitor TAU uses SCIRun [U. Utah] for visualization of performance data (online/offline)

  24. Performance Database Framework Raw performance data Performance analysis programs Performance analysis and query toolkit PerfDML data description PerfDML translators ORDB PostgreSQL • XML profile data representation • Multiple experiment performance database . . .

  25. Performance Tracking and Reporting • Integrated performance measurement allows performance analysis throughout development lifetime • Applied performance engineering in software design and development (software engineering) process • Create “performance portfolio” from regular performance experimentation (couple with software testing) • Use performance knowledge in making key software design decision, prior to major development stages • Use performance benchmarking and regression testing to identify irregularities • Support automatic reporting of “performance bugs” • Enable cross-platform (cross-generation) evaluation

  26. Mail server Web server XPARE System Architecture Experiment Launch Performance Database Performance Reporter Alerting Setup Comparison Tool Regression Analyzer

  27. Outline • Overview of the TAU project • Performance analysis of distributed Java programs • Profiling toolkit, PDT, Database, Monitor, XPARE • Performance Engineered Component Software • CCA Performance Observation Component • CCAFFEINE (Classic C++) • SIDL • Applications: • Optimizer Component • Combustion Component • Concluding remarks

  28. Motivation for Parallel Component Software • History of HPC reflects evolving complexity of parallel and distributed systems used for scientific computing • Application development environments leverage power of software abstraction in scientific problem solving • Natural tension between achieving high performance and software engineering for scientific computing • Common dogma: further software is away from the raw machine, the harder it is to achieve good performance • Strategies include layered infrastructure with rich middleware support implemented for high-performance • Compromise is to further distance application developer from broader range performance sources and problems

  29. Scientific Component Software • Emerging use for scientific high-performance computing • Targets spectrum of HPC systems and applications • Realistic programming and computing model for the Grid • Pose challenges for performance engineering • Software and system diversity • Flexibility in construction and connection • Language interoperability • Platform interoperability • Architecture vs. implementation • New environment for performance problem solving? • Does HPC performance experience / technology apply?

  30. Component Technology • What is a component? • Implementation provides functionality buts hides details • No direct access is possible • Interface provides access to component functionality • Access “ports” are well-defined and generated by tools • Matching connector links component interfaces • Constructed by framework and hidden from users

  31. Component Technology Features • Interoperability across multiple languages • Language independent interfaces (C/C++, Fortran, Java,…) • Automatically generated bindings to working code • Interoperability across multiple platforms • Computer systems hardware independence • Operating systems independence • Transparent execution model • Serial, parallel, and distributed system • Incremental evolution of application software • Components promote software reuse • Components are “plug-and-play”

  32. CCA Forum • Define specifications for high-performance scientific components and frameworks • Preservation of performance • Promote development of domain-specific “standard” interfaces • Goal: interoperability between components developed by different expert teams across different institutions • Quarterly meetings and open membership • Mailing list: cca-forum@cca-forum.org • Website: http://www.cca-forum.org

  33. DOE CCTTSS • DOE SciDAC ISIC • Scientific Discovery through Advanced Computing • Integrated Software Infrastructure Center • Subsetof CCA Forum • Develop CCA technology from current prototype stage to full production environment • Increase understanding of how to use component architectures effectively in HPC environments • Participants: • Rob Armstrong, Lead, SNL

  34. Proxy generator Builder Common Component Architecture Specification CCA ports Scientific IDL Framework-specific part of CCA ports Abstract configuration API Component 1 Component 2 Repository API Repository CCA Services Any CCA compliant framework

  35. CCA Concepts: Ports • Designing for interoperability and reuse requires “standard” interfaces • Ports define how components interact • Through well-defined interfaces (ports) • In OO languages, a port is a class or interface • In Fortran, a port is a set of subroutines or a module • Components may provideports • Implement the class or subroutines of the port • Components may useports • Call methods or subroutines in the port • Links denote a caller/callee relationship

  36. CCA Concepts: Frameworks • Provides the means to “hold” components and compose them into applications • Allow exchange of ports among components without exposing implementation details • Provide a small set of standard services to components • Builder services allow programs to compose CCA apps • Frameworks may make themselves appear as components in order to connect to components in other frameworks • Specific frameworks support specific computing models

  37. CCA Example • Numerically integrate a continuous function • Use two different techniques • Lines show portconnections • Dashed lines arealternate portconnections FunctionPort IntegratorPort FunctionPort NonlinearFunction FunctionPort MidpointIntegrator GoPort IntegratorPort LinearFunction x a b FunctionPort Driver IntegratorPort FunctionPort PiFunction RandomGeneratorPort RandomGeneratorPort MonteCarloIntegrator xn uniformilydistributedover [a,b] RandomGenerator x a b

  38. CCA Framework Prototypes • CCAFFEINE • SPMD/SCMD parallel, direct connect • Direct connection • CCAT / XCAT • Distributed network • Grid Web services • SCIRun • Parallel, multithreaded, direct connect • Decaf • Language interoperability via Babel • Legion (under development)

  39. Performance-Engineered Component Software • Intra- and Inter-component performance engineering • Four general parts: • Performance observation • integrated measurement and analysis • Performance query and monitoring • runtime access to performance information • Performance control • mechanisms to alter performance observation • Performance knowledge • characterization and modeling • Consistent with component architecture / implementation

  40. Main Idea: Extend Component Design • Extend the programming and execution environment to be performanceobservable and performance aware repository service ports performance observation ports performance knowledge ports componentports … … … PerformanceKnowledge PerformanceObservation Component Core … Component Performance Repository variants measurement analysis empirical analytical

  41. Performance Observation and Component • Performance measurementintegration in component form • Functional extension of originalcomponent design ( ) • Include new componentmethods and ports ( ) for othercomponents to access measuredperformance data • Allow original component to access performance data • Encapsulate as tightly-couple and co-resident performance observation object • POC “provides” port allow use of optimized interfaces ( )to access ``internal'' performance observations performance observation ports componentports … … PerformanceObservation Component Core … variants  measurement  analysis

  42. Performance Knowledge • Describe and store “known” component performance • Benchmark characterizations in performance database • Empirical or analytical performance models • Saved information about component performance • Use for performance-guided selection and deployment • Use for runtime adaptation • Representation must be in common forms with standard means for accessing the performance information • Compatible with component architecture

  43. Component Performance Repository • Performance knowledge storage • Implement in componentarchitecture framework • Similar to CCA componentrepository • Access by componentinfrastructure • View performance knowledge as component (PKC) • PKC ports give access to performance knowledge • to other components, back to original component • Static/dynamic component control and composition • Component composition performance knowledge repository service ports performance knowledge ports … PerformanceKnowledge Component Performance Repository  empirical analytical

  44. Performance Engineering Support in CCA • Define a standard observation component interface for: • Performance measurement • Performance data query • Performance control (enable/disable) • Implement performance interfaces for use in CCA • TAU performance system • CCA component frameworks (CCAFFEINE, SIDL/Babel) • Demonstrations • Optimizing component • picks from a set of equivalent CCA port implementations • Flame reaction-diffusion application

  45. CCA Performance Observation Component • Design measurement port and measurement interfaces • Timer • start/stop • set name/type/group • Control • enable/disable groups • Query • get timer names • metrics, counters, dump to disk • Event • user-defined events

  46. CCA C++ (CCAFFEINE) Performance Interface namespace performance { namespace ccaports { class Measurement: public virtual classic::gov::cca::Port { public: virtual ~ Measurement (){} /* Create a Timer interface */ virtual performance::Timer* createTimer(void) = 0; virtual performance::Timer* createTimer(string name) = 0; virtual performance::Timer* createTimer(string name, string type) = 0; virtual performance::Timer* createTimer(string name, string type, string group) = 0; /* Create a Query interface */ virtual performance::Query* createQuery(void) = 0; /* Create a user-defined Event interface */ virtual performance::Event* createEvent(void) = 0; virtual performance::Event* createEvent(string name) = 0; /* Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ virtual performance::Control* createControl(void) = 0; }; } } Measurement port Measurement interfaces

  47. CCA Timer Interface Declaration • namespace performance { • class Timer { public: virtual ~Timer() {} • /* Implement methods in a derived class to provide functionality */ • /* Start and stop the Timer */ virtual void start(void) = 0; • virtual void stop(void) = 0; /* Set name and type for Timer */ • virtual void setName(string name) = 0; • virtual string getName(void) = 0; virtual void setType(string name) = 0; virtual string getType(void) = 0; • /* Set the group name and group type associated with the Timer */ virtual void setGroupName(string name) = 0; virtual string getGroupName(void) = 0; • virtual void setGroupId(unsigned long group ) = 0; virtual unsigned long getGroupId(void) = 0; }; • } Timer interface methods

  48. Use of Observation Component in CCA Example #include "ports/Measurement_CCA.h"... double MonteCarloIntegrator::integrate(double lowBound, double upBound, int count) { classic::gov::cca::Port * port;double sum = 0.0; // Get Measurement port port = frameworkServices->getPort ("MeasurementPort"); if (port) measurement_m = dynamic_cast < performance::ccaports::Measurement * >(port); if (measurement_m == 0){ cerr << "Connected to something other than a Measurement port"; return -1; } static performance::Timer* t = measurement_m->createTimer( string("IntegrateTimer")); t->start(); for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); } t->stop(); }

  49. Using TAU Component in CCAFEINE repository get TauTimer repository get Driver repository get MidpointIntegrator repository get MonteCarloIntegrator repository get RandomGenerator repository get LinearFunction repository get NonlinearFunction repository get PiFunction create LinearFunction lin_func create NonlinearFunction nonlin_func create PiFunction pi_func create MonteCarloIntegrator mc_integrator create RandomGenerator rand create TauTimer tau connect mc_integrator RandomGeneratorPort rand RandomGeneratorPort connect mc_integrator FunctionPort nonlin_func FunctionPort connect mc_integrator TimerPort tau TimerPort create Driver driver connect driver IntegratorPort mc_integrator IntegratorPort go driver Go quit

  50. Measurement Port Implementation • Use of Measurement port (i.e., instrumentation) • independent of choice of measurement tool • independent of choice of measurement type • TAU performance observability component • Implements the Measurement port • Implements Timer, Control, Query, Control • Port can be registered with the CCAFEINE framework • Components instrument to generic Measurement port • Runtime selection of TAU component during execution • TauMeasurement_CCA port implementation uses a specific TAU library for choice of measurement type

More Related