340 likes | 432 Views
PAT. UPC/SHMEM Language Analysis and Usability Study. Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant HCS Research Laboratory
E N D
PAT UPC/SHMEM Language Analysis and Usability Study Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant HCS Research Laboratory University of Florida
Purpose and Method • Purpose • Determine performance factors purely from language’s perspective • Gain insight into how to best incorporate performance measurement with various implementations • Method • Create a complete and minimal factor list • Analyze UPC and SHMEM (Quadrics) specs • Analyze various UPC/SHMEM implementations + discussions with developers
Factor List Creation • Factor list developed based on observations from other (tool, analytical model, etc.) studies • Ensures factors are measurable • Provides insight into how they can be measured • Only basic events included to eliminate redundancy • Sufficient for time-based analysis and memory system analysis • Completion notification – Calling thread waiting for completion of a one-sided operation initiated by calling thread • Synchronization – multiple threads waiting for each other to complete a single task • Local access – refers only to access of local shared (global) variable
SHMEM Analysis • Performed on Quadrics SHMEM specification and GPSHMEM library • Great similarity between implementations • Factors for each construct involves execution + • Small transfer (put/get) • Synchronization (other) • Variations between implementations troublesome • A standard for SHMEM/GPSHMEM function set is desirable • General: provides user with a uniform library set • PAT: reduces complexity of system (i.e. possibly only one wrapper library is sufficient) • Wrapper approach (ex: PSHMEM) fits very well • Can borrow many ideas from PMPI • However, analysis of data transfers needs special care to handle one-sided communication • See Language Analysis sub-report for construct-factor assignments
UPC Analysis (1) • Performed on UPC spec. 1.1, Berkeley UPC, Michigan Tech UPC, and HP UPC (in progress) • See Language Analysis sub-report for construct-factor assignment • Specification analysis • Educated guesses, attempts to cover all aspect of language • Too generic for PAT development • Implementations • Many similarities between implementations • Wrapper mentality works with UPC function constructs PUPC proposal • Pre-processor needed to handle UPC non-function constructs
UPC Analysis (2) • Implementations (cont.) • HP-UPC • Composed of UPC Compiler (compiler), Run-Time System (RTS), and (optional) Run-Time Environment (RTE) • UPC global variable access translates to HW shared-memory access impacts time of instrumentation • Waiting for Brian at HP to send details on UPC functions to complete construct-factor assignment • GCC-UPC: will be studied after completion of HP UPC
Berkeley UPC Analysis (1) • Based on version 2.0.1 • Analysis at UPC level with some consideration at communication level • Noteworthy implementation details • upc_all_alloc and upc_all_lock_alloc: use of all-to-all broadcast • Upc_alloc and upc_global_alloc behave like upc_local_alloc: double size of heap when running out of space • Multiple mechanisms for implementing barrier • HW supported (Ex: InfiniBand) • Custom barrier (Ex: SHMEM/lapi) • Centralized (Other, current) logarithmic dissemination (other, future) • Impact on PAT • UPC level only instrumentation 1 unit, less accurate • UPC + communication level instrumentation multiple units, more accurate
Berkeley UPC Analysis (2) • Noteworthy implementation details (cont.) • Three different translations for upc_forall • All tasks can be done by 1 thread if statement followed by a regular for loop • Tasks are cyclic distributed for loop with stride factor equal to number of threads • Tasks are block distributed two-level for loops are used (outer level is same as in second case and inner loop is a regular for loop corresponding to all elements in block) • Impact on PAT – instrumentation needed before translation
Michigan Tech UPC Analysis • Based on version 1.1 • Noteworthy implementation details • Uses a centralized control for most control processes (i.e. split and non-split barriers, collective array allocation, collective lock allocation and global exit.) • Based on two pthreads system using consumer-producer mechanism. • Program thread (producer): adds entries to appropriate send queues • Communication thread (consumer): sending and processing requests via MPI (no aggregation of data for optimization, bulk transfer = x small transfers) • Impact on PAT – transfer, completion and synchronization is much harder to track • Uses flat broadcast and tree broadcast • Caching capability complicates analysis
Summary • Factor list and construct-factor assignment provide basis for practical event tracing in UPC and SHMEM • SHMEM • Wrapper library approach appears ideal • Push for SHMEM standardization will simplify development • UPC • Hybrid pre-processor/wrapper library approach appears appropriate (compatible with GCC-UPC?) • Analysis provides insights on how to instrument UPC/SHMEM programs and raises awareness to possible difficulties
Usability: Purpose and Methods • Purpose • Determine factors affecting usability of performance tools • Determine how to incorporate knowledge about factors into PAT • Methods • Elicit user feedback through a Performance Tool Usability Survey (survey generated after some literature reviews) • Review and provide a concise summary of literature in area of usability for parallel performance tools • Outline • Discuss common problems seen in performance tools • Provide a discussion on factors influencing usability of performance tools • Outline how to incorporate user-centered design into PAT • Present guidelines to avoid common usability problems
Performance Tool Usability: • General Performance Tool Problems • Difficult problem for tool developer • Inherently unstable execution environment • Monitoring behavior may disturb original behavior • Short lifetime of parallel computers • Users • Tools too difficult to use • Too complex • Unsuitable for real-world applications • Users skeptical about value of tools
Discussion on Usability Factors* (1) * C. Pancake, ‘‘Improving the Usability of Numerical Software through User-Centered Design,’’ The Quality of Numerical Software: Assessment and Enhancement, ed. B. Ford and J. Rice, pp. 44-60, 1997. • Ease-of-learning • Concern • Important for attracting new users • Tool’s interface shapes user’s understanding of its functionality • Inconsistency leads to confusion (e.g. providing defaults for some object but not all) • Possible solutions • Strive for internally and externally consistent tool • Stick to established conventions • Provide uniform interface • Target as many platforms as necessary so user can amortize time invested over many uses • Usefulness • Concern: How directly tool helps user achieve their goal • Possible Solution: Make common case simple even if that makes rare case complex
Discussion on Usability Factors (2) • Ease-of-use • Concern: Amount of effort required to accomplish work with tool too high to justify tool’s use • Possible solutions • Do not force user to memorize information about interface – use menus, mnemonics, and other mechanisms • Provide a simple interface • Make all user-required actions concrete and logical • Throughput • Concern: How does tool contribute to user productivity in general • Keep in mind that inherent goal of tool is to increase user productivity
User-Centered Design • Concept that usability should be driving factor in tool development • Based on premise that usability will only be achieved if design process is user-driven • Four-step model to incorporate user feedback* (chronological) • Ensure initial functionality is based on user needs • Solicit input directly from users • MPI users (for information about existing tools) • UPC/SHMEM users • Sponsor • Analyze how users identify and correct performance problems • UPC/SHMEM users primarily • Gain better idea of how the tool will actually be used on real programs • Information from users then presented to sponsor for critique/feedback • Implement Incrementally • Organize interface so that most useful features are best supported • User evaluation of preliminary/prototype designs • Maintain strong relationship with users with whom we have access • Have users evaluate every aspect of tool’s interface, structure, and behavior • Alpha/Beta testing • User tests should be performed at many points along the way • Feature-by-feature refinement in response to specific user feedback * S. Musil, G. Pigel, M. Tscheligi. “User Centered Monitoring of Parallel Programs with InHouse.” HCI ’94 Ancillary Proceedings, 1994.
Performance Tool Usability: Guidelines • Issues for Performance Tools and Solutions • Many tools begin by presenting windows with detailed info on a performance metric • Users prefer broader perspective on application behavior • Some tools provide multiple views of program behavior • Good idea, but need support for comparing different metrics • For example, if CPU utilization drops in same place, L1 cache miss rate rises • Also essential to provide source-code correlation to be useful • User does not want info that cannot be used to fix code
Performance Tool Usability: Summary • Summary • Tool will not gain user acceptance until useable in real-world environment • Need to identify successful user strategies from existing tools for real applications • Devise ways to apply successful strategies to tool in an intuitive manner • Use this functionality in development of new tool
Presentation Methodology: Introduction • Why use visualizations? • To facilitate user comprehension • To convey complexity and intricacy of performance data • Help bridge gap between raw performance data and performance improvements • When to use visualizations? • On-line: visualization while application is running (can slow down execution significantly) • Post mortem: after execution (usually based on trace data gathered at runtime) • What to visualize? • Interactive displays to guide the user • Default visualizations should provide high-level views • Low-level information should be easily accessible
General Approaches to Performance Visualization • General Categories • System/Application-independent: depict performance data for variety of systems and applications – most tools use this approach • Meta-tools: facilitate development of custom visualization tools • Other Categories • On-line: visualization during execution • Can be intrusive • Volume of information may be too large to interpret without playback functionality • Allows user to observe only interesting parts of execution without waiting • Post mortem: visualization after execution • Have to wait to see visualizations • Easier to implement • Less intrusion on application behavior
Useful Visualizations Techniques • Animation • Has been employed by various tools to provide program execution replay • Most commonly animated events are communication operations • Viewing data dynamically may illuminate bottlenecks more efficiently • However, animation usually very cumbersome in practice • Program graphs • Generalized picture of entire system • Gantt charts • De facto standard for displaying inter-process communication • Data access displays • Each cell of 2D display is devoted to an element of the array • Color distinguishes between local/remote and read/write • Critical path analysis • Concerned with identifying program regions which most contribute to program execution time • Graph depicts synchronization and communication dependencies among processes in program
Guidelines and Interface Evaluation • General Guidelines* • Visualization should guide, not rationalize • Scalability is crucial • Color should inform, not entertain • Visualization should be interactive • Visualizations should provide meaningful labels • Default visualization should provide useful information • Avoid showing too much detail • Visualization controls should be simple • Goals, Operators, Methods, and Selection Rules (GOMS) • Formal user interface evaluation technique • Way to characterize a set of design decisions from point of view of user • Description of what user must learn; may be basis for reference documentation • May be able to use GOMS analysis in design of PAT • Knowledge described in a form that can actually be executed (there have been several fairly successful attempts to implement GOMS analysis in software, e.g. GLEAN) • Various incarnations of GOMS with different assumptions useful for more specific analyses (KVL, CMN-GOMS, NGOMSL, CPM-GOMS, etc.) * B. Miller. “What to Draw? When to Draw?: an essay on parallel program visualization,” Journal of Parallel and Distributed Computing, 18:2, pp. 265-269, 1993.
Simple GOMS Example: OS X • GOMS model for OS X • Method for goal: delete a file • Step 1. Think of file name and retain as first filespec (file specifier) • Step 2. Accomplish goal: drag file to trash • Step 3. Return with goal accomplished • Method for goal: move a file • Step 1. Think of file name and retain as first filespec • Step 2. Think of destination directory name and retain as second filespec • Step 3. Accomplish goal: drag file to destination • Step 4. Return with goal accomplished
Simple GOMS Example: UNIX • GOMS model for UNIX • Method for goal: delete a file • Step 1. Recall that command verb is rm -f • Step 2. Think of file name and retain as first filespec • Step 3. Accomplish goal: enter and execute a command • Step 4. Return with goal accomplished • Method for goal: copy a file • Step 1. Recall that command verb is cp • Step 2. Think of file name and retain as first filespec • Step 3. Think of destination directory name and retain as second filespec • Step 4. Accomplish goal: enter and execute a command • Step 5. Return with goal accomplished
Summary • Plan for development • Develop a preliminary interface that provides functionality required by user while conforming to visualization guidelines • After preliminary design is complete, elicit user feedback • During periods where user contact is unavailable, may be able to use GOMS analysis or another formal interface evaluation technique