Conference Review Presented by: Ivan Matosevic

CGO 2006:The Fourth International Symposium on Code Generation and OptimizationNew York, March 26-29, 2006 Conference Review Presented by: Ivan Matosevic

Outline • Conference overview • Brief summaries of sessions • Keynote speeches • Best paper

Conference Overview • Primary focus: back-end compilation techniques • Static analysis and optimization • Profiling • Run-time techniques • 8 sessions, 29 papers • Dominating topics: multicores, dynamic compilation

Overview of Session • Dynamic Optimization • Object-Oriented Code Generation and Optimization • Phase Detection and Profiling • Tiled and Multicore Compilation • Static Code Generation and Optimization Issues • SIMD Compilation • Optimization Space Exploration • Security and Reliability

Session 1: Dynamic Optimization • Kim Hazelwood (University of Virginia), Robert Cohn (Intel), A Cross-Architectural Interface for Code Cache Manipulation • Pin dynamic instrumentation system with code cache • The paper describes an API for various operations with the code cache (callbacks, lookups, statistics, etc.) • Derek Bruening, Vladimir Kiriansky, Tim Garnett, Sanjeev Banerji (Determina Corporation), Thread-Shared Software Code Caches • Problem: sharing a code cache across multiple threads • Authors propose a fine-grained locking scheme • Evaluation using DynamoRIO

Session 1: Dynamic Optimization • Keith Cooper, Anshuman Dasgupta (Rice Univ.), Tailoring Graph-coloring Register Allocation For Runtime Compilation • Problem: register allocation in JIT compilers • Authors propose a novel lightweight graph-colouring technique • Weifeng Zhang, Brad Calder, Dean Tullsen (UC San Diego), A Self Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework • Extension of the Trident event-driven dynamic optimization framework (previously proposed by the same authors) • Dynamic insertion of prefetching instructions based on run-time analysis

Session 2: Object-Oriented CodeGeneration and Optimization • Suresh Srinivas, Yun Wang, Miaobo Chen, Qi Zhang, Eric Lin, Valery Ushakov, Yoav Zach, Shalom Goldenberg (Intel Corporation), Java JNI Bridge: An MRTE Framework for Mixed Native ISA Execution • Use a dynamic translator for the execution of native calls to one ISA on a different ISA’s Java platform • Kris Venstermans, Lieven Eeckhout, Koen De Bosschere (Ghent University), Space-Efficient 64-bit Java Objects through Selective Typed Virtual Addressing • Use address bits on a 64-bit architecture to encode object type in order to save memory • Objects of the same type allocated in a contiguous (virtual) region

Session 2: Object-Oriented CodeGeneration and Optimization • Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan (IBM Canada), Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler • The IBM TestaRossa JIT compiler • This paper focuses on code patching and profiling in a multi-threaded environment with a lot of class loading/unloading • Lixin Su, Mikko H Lipasti (University of Wisconsin Madison), Dynamic Class Hierarchy Mutation • Run-time reassignment of objects from one derived class to another, changing its virtual tables • Offers opportunity for optimizations based on specialization

Session 3: Phase Detection and Profiling • Priya Nagpurkar, (UCSB), Michael Hind (IBM), Chandra Krintz, (UCSB), Peter Sweeney, V.T. Rajan (IBM), Online Phase Detection Algorithms • Detecting phase behaviour in virtual machines • Track dynamic program parameters (methods invoked, branch directions…) over time and apply a similarity model • Jeremy Lau, Erez Perelman, Brad Calder (UC San Diego), Selecting Software Phase Markers with Code Structure Analysis • Portions of code whose execution correlates with phase changes • Procedure calls and returns, loop boundaries • Profile-based hierarchical loop-call graph

Session 3: Phase Detection and Profiling • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara), Profiling over Adaptive Ranges • Voted best paper – details later • Hyesoon Kim, Muhammad Aater Suleman, Onur Mutlu, Yale N. Patt (UT-Austin), 2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set • Predicts whether the prediction accuracy of each branch will vary across input sets • Heuristic approach used to derive representative profiling results from a single input set

Session 4: Tiled and Multicore Compilation • David Wentzlaff, Anant Agarwal (MIT), Constructing Virtual Architectures on a Tiled Processor • Map components of a superscalar architecture (Pentium III) onto a parallel tiled architecture (Raw) using dynamic translation • In a way, uses Raw as a coarse-grain FPGA • Aaron Smith, (UT-Austin), J. Burrill, (UMass at Amherst), J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinley (UT-Austin), Compiling for EDGE Architectures • TRIPS EDGE (Explicit Data Graph Execution) architecture • This paper focuses on compilation of standard C and FORTRAN benchmarks

Session 4: Tiled and Multicore Compilation • Shih-wei Liao, Zhaohui Du, Gansha Wu, Guei-Yuan Lueh (Intel), Data and Computation Transformations for Brook Streaming Applications on Multiprocessors • Parallel compiler for the Brook streaming language • An extension of C that enables specifying data parallelism • Michael L. Chu, Scott A. Mahlke (University of Michigan), Compiler-directed Object Partitioning for Multicluster Processors • Partitioning of data in clustered architectures such as Raw • I didn’t really understand what programming model these authors have in mind?

Session 5: Static Code Generation andOptimization Issues • Two papers about the HPUX Itanium compiler: • Dhruva R. Chakrabarti, Shin-Ming Liu (Hewlett-Packard), Inline Analysis: Beyond Selection Heuristics • Cross-module techniques for selection of inlined call sites and the choice of specialized function versions • Robert Hundt, Dhruva R. Chakrabarti, Sandya S. Mannarswamy (Hewlett-Packard), Practical Structure Layout Optimization and Advice • Data layout and placement on the heap to improve locality • Structure splitting, structure peeling, dead field removal, and field reordering

Session 5: Static Code Generation andOptimization Issues • Chris Lupo, Kent Wilken (University of California, Davis), Post Register Allocation Spill Code Optimization • Authors propose a profile-based algorithm for placement of save/restore instructions handling spilled variables in function calls • Implemented as a part of GCC • Seung Woo Son, Guangyu Chen, Mahmut Kandemir (Pennsylvania State University), A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality • Goal: restructure code so that disk idle periods are lengthened • The approach targets array-based programs: disk layout of array data exposed to the compiler

Session 6: SIMD Compilation • Jianhui Li, Qi Zhang, Shu Xu, Bo Huang (Intel China Software Center), Optimizing Dynamic Binary Translation for SIMD Instructions • Algorithms for dynamic binary translation of SIMD instructions in general-purpose architectures (such as MMX in x86) • Evaluation using IA-32 binaries on Itanium 2 • Dorit Nuzman (IBM), Richard Henderson (Red Hat), Multi-Platform Auto-Vectorization • Implementation of automatic vectorizer for GCC 4.0

Session 7: Optimization-space Exploration • Felix Agakov, Edwin Bonilla, John Cavazos, Bjoern Franke, Grigori Fursin, Michael O'Boyle, Marc Toussaint, John Thomson, Chris Williams (U. of Edinburgh), Using Machine Learning to Focus Iterative Optimization • Predictive modelling used to search the optimization space • Targets embedded platforms – AMD Au1500 and Texas Instruments TI C6713 • Prasad Kulkarni, David Whalley, Gary Tyson (Florida State University), Jack Davidson (University of Virginia), Exhaustive Optimization Phase Order Space Exploration • Exhaustive search of the phase order space (15 phases) using aggressive pruning; takes time on the order of minutes to hours • Targets StrongARM SA-100

Session 7: Optimization-space Exploration • Zhelong Pan, Rudolf Eigenmann (Purdue University), Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning • Problem: find the optimal combination of 38 GCC O3 options, targeting Pentium IV and Sparc II • Proposed heuristic algorithm that provides s quality solution in time on the order of several hours

Session 8: Security and Reliability • Edson Borin, (UNICAMP), Cheng Wang, Youfeng Wu (Intel), Guido Araujo (UNICAMP), Software-Based Transparent and Comprehensive Control-Flow Error Detection • Addresses the problem of soft (transient) errors that cause branches to incorrect instructions • Implemented in SW as a part of a dynamic binary translator • Tao Zhang, Xiaotong Zhuang, Santosh Pande (Georgia Tech), Compiler Optimizations to Reduce Security Overheads • Optimizations that specifically target techniques that implement software protection with minimal HW support

Session 8: Security and Reliability • Susanta Nanda, Wei Li, Tzi-cker Chiueh (State University of NY at Stony Brook), BIRD: Binary Interpretation using Runtime Disassembly • Goal: framework for automatic detection of vulnerabilities such as buffer overflows when the source code is not available • Static and dynamic disassembly and instrumentation – targets Windows x86 application

Keynote Speeches • Wei Li, Principal Engineer, Intel: "Parallel Programming 2.0" • Kevin Stoodley, Fellow and CTO of Compilation Technology, IBM: "Productivity and Performance: Future Directions in Compilers"

Wei Li: Parallel Programming 2.0 • Major technological change: • Moore’s Law continues to increase transistor counts • However: power, memory latency, limits to ILP are setting an effective performance ceiling • General trend towards thread-level on-chip parallelism • SMT • Chip multiprocessors

Wei Li: Parallel Programming 2.0 • “Parallel Programming 2.0” refers to the advent of multicores • A very optimistic future vision:

Wei Li: Parallel Programming 2.0 • Key issue – where will the parallelism come from? • Parallel programming needs to become more mainstream • Consumer vs. HPC/server/database • Inclusion into education at more elementary level • New tools for greater ease of programming • Intel’s parallel programming tools • http://www.intel.com/software

K. Stoodley:"Productivity and Performance: Future Directions in Compilers" • Limits to traditional static compilation • Overview of IBM compiler technology • Testarossa JIT compiler, Toronto Portable Optimizer, Tobey backend • Challenges at present and near future • Software abstraction complexity – forces the scope of compilation to higher levels • Maintaining high performance backwards compatibility increasingly difficult

xlc xlC xlf Front Ends class class jar W-Code J9 Execution Engine (Java + Others) CPO Toronto Portable Optimizer (TPO) TOBEY Backend Testarossa JIT Dynamic Machine Code Binary Translation Profile-Directed Feedback (PDF) Static Machine Code K. Stoodley:"Productivity and Performance: Future Directions in Compilers" • Future: convergence/combination of dynamic and static compilation technologies

Best Paper • Shashidhar Mysore, Banit Agrawal, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri (UC Santa Barbara): Profiling over Adaptive Ranges

Profiling over Adaptive Ranges • Problem: how to count specific events efficiently and accurately? • Code segments executed • Memory regions accessed • IP addresses of routed packets • In all cases, impossible to maintain separate counters for the entire range of values • Each basic block, memory address, IP address…

Trade-off: Precision vs. Efficiency • Profiling with uniform ranges fails to distinguish hot code Uniform ranges Unlimited counters

Higher Precision for Hot Regions • Good trade-off with limited resources: • High precision for hot regions • Low precision for colder ones, but this affects the accuracy less • Challenge: how to determine what exactly to count with what precision?

Solution: Adaptive Profiling • Start with one counter; split counters as they become hot:

Counter Merging • Problem: what if program behaviour changes after the initialization phase?

Counter Merging • Solution: perform counter merging along with splitting

Counter Merging • Counters of merged child nodes added to the parent

Counter Merging • Problem: how to identify nodes for merging? • They are by definition those ones that are not updated frequently • Solution: periodic batched merge operations • Tree depth grows at logarithmic rate  can be done at exponentially increasing intervals

Additional Contributions • Heuristics for splitting and merging • Theoretical analysis of accuracy guarantees • Proposal for hardware implementation • Experimental evaluation • Memory requirements • Average and worst-case errors on benchmarks • Performance of HW implementation • Accuracies on the order of 98.0-99.8% with only 8-64K of memory

Conclusions • Highly interesting program • My short presentation certainly doesn’t do justice to most of the mentioned works! • Readings to perhaps consider for future CARG: • D. Wentzlaff, A. Agarwal, Constructing Virtual Architectures on a Tiled Processor • A. Smith et al., Compiling for EDGE Architectures • F. Agakov et al., Using Machine Learning to Focus Iterative Optimization • (Highly subjective!)

Conference Review Presented by: Ivan Matosevic

Conference Review Presented by: Ivan Matosevic

Presentation Transcript

Justin E. Tilton and Jim Farmer As presented at the e-Learning 2006 Conference February 12, 2006 | Savannah, Georgia USA

The Death of Ivan Ilyich

Software Review Process and Level of FAA Involvement

Ivan Pavlov

Physics 1230: Light and Color Ivan I. Smalyukh, Instructor

Durable Medical Equipment cleaning/sanitizing

2008 Galveston Brain Injury Conference

Immunohematology Review

Semantic Web Activities @ W3C

CRT/RRT EXAM REVIEW WORKSHOP

SESSION 10

AP PSYCH UNIT II

Introduction to RDFa

First Hour Conference Intraining Exam Review November 8, 2006

Ivan Martin CEO Misys Banking Division

Hurricane Ivan

Investigation and Prosecution of a Complex Financial Case

HOME MEDICATION REVIEW