800 likes | 1.02k Views
Advanced Compiler Techniques. Future Research Directions o f Compilers. LIU Xianhua School of EECS, Peking University. Outline. Introduction & History Special Purpose Compilers Free-standing compilers Under the hood Inside applications In the toolchain Inside libraries New Languages
E N D
Advanced Compiler Techniques Future Research Directions of Compilers LIU Xianhua School of EECS, Peking University
Outline • Introduction & History • Special Purpose Compilers • Free-standing compilers • Under the hood • Inside applications • In the toolchain • Inside libraries • New Languages • Just-in-time & Run-time support “Advanced Compiler Techniques”
1950s: Just a Compiler, Please • The compiler references a runtime, but the runtime is supplied by the OS at a fixed location in memory • FORTRAN runtime: input/output formatting • COBOL runtime: also search and sort • OS loader loads the compiler output into memory, transfers control • Address space is small (< 8K word), CPU is slow (< 1,000 instructions/sec.) • Figure of merit: Code Quality • Compiler must optimize code for space • Compiler must optimize code for speed “Advanced Compiler Techniques”
Inside the Compiler (in concept) • Parse source code • Produce abstract syntax tree (AST) • Produce symbol table • Generate errors • Syntax errors • Type errors • Unbound references Front End Compiler Compiler Compiler Source Code Source Code Source Code Front End Front End • Linearize parse tree • Code Analysis • Basic block analysis • Control- and data-flow graph analysis • Optimize (machine-independent) • Redundant and dead code elimination • Code restructuring • Convert to executable code • Register allocation • Peephole optimization • Branch prediction and tensioning Back End Back End Back End ExecutableCode ExecutableCode ExecutableCode “Advanced Compiler Techniques”
1960s: Linkers • Programs are growing in size • Programs are built with libraries • Libraries provide reusable code fragments • Virtual memory systems are invented • Tool chain is in two stages • Compile independent modules • Combine the modules using a linker • Figure of merit:Code quality (speed) “Advanced Compiler Techniques”
Tools: Compiler + Linker Source Code Source Code Source Code Front End Front End Front End Back End Back End Object Code Object Code Compiler Back End Object Code Includes external references Linker Executable Code “Advanced Compiler Techniques”
1970s: Symbolic Debugger • OS written in high-level language • Compilers provide sufficient code performance and low-level access • High-level languages provide large runtime libraries in multiple units • Static linker pulls only required units into a given program image • Compiler exports symbol table for use by debugger, not just internal to front-/back-end • Figure of merit:Code quality (speed) “Advanced Compiler Techniques”
Compiler, Linker, Debugger Source Code Source Code Source Code Front End Front End Front End Back End Back End Object Code Object Code Compiler Back End Object Code Symbol table(s) Linker Debugger Running Program “Advanced Compiler Techniques”
1980s: Dynamic Loading, Threading • To improve OS performance, by reducing physical memory pressure, read/only parts of libraries are shared between applications • Loaded on first reference • OS loader fixes up references to shared libraries – just like the static linkers • Not all libraries are loaded into the same virtual address • Concurrency issues addressed in programming languages • Locks, monitors, events, polling • Order of operations visible across thread boundaries • Memory model semantics become an issue • Ada™ introduces rendez-vous, other languages have other constructs • Tool chain • Compiler(s) • Linker • Loader • Symbolic debugger • Figure of merit: Code quality (speed, but this is related to space) “Advanced Compiler Techniques”
OS Dynamic Loader Source Code Source Code Source Code Front End Front End Front End Back End Back End Object Code Object Code Compiler Back End Object Code Includes fixups for shared code Static Linker Symbol table(s) Image File Image File Image File OS Loader Debugger Running Program “Advanced Compiler Techniques”
1990s: JITs and Managed Runtimes • Garbage Collection goes mainstream • Previously: LISP, APL, SmallTalk • 1990s: Java, Jscript, C#, VB • Verification requires runtime to analyze code • Verification is similar to front-end compiler work • Can be done to native code, but much simpler with an intermediate language • Just-in-time (JIT) compilation increases performance over pure interpretation • Typically by a factor of 5 to 15 • Tool chain: split the compiler in two! • Linearize the AST to create Intermediate Language (IL) • Save symbol table as “metadata” • Reorder the chain • Figures of merit: Throughput first, code quality second “Advanced Compiler Techniques”
OS Dynamic Loader (repeat) Source Code Source Code Source Code Front End Front End Front End Back End Back End Object Code Object Code Compiler Back End Object Code Includes fixups for shared code Static Linker Symbol table(s) Image File Image File Image File OS Loader Debugger Running Program “Advanced Compiler Techniques”
OS Dynamic Loader (repeat) Source Code Front End Compiler Back End Object Code Static Linker Image File OS Loader Debugger Running Program “Advanced Compiler Techniques”
Managed Runtime Source Code Front End Back End Object Code Static Linker Image File OS Loader Compiler Compiler Image File OS Loader DynamicLinker Runtime Back End Debugger Running Program “Advanced Compiler Techniques”
Managed Runtime Source Code Front End Back End Object Code Static Linker Image File OS Loader Metadata + Intermediate Language Compiler Compiler Image File OS Loader DynamicLinker Runtime Back End Debugger Running Program “Advanced Compiler Techniques”
2000s: Reflection-based Computation • Reflection: ability of a program to observe and possibly modify its structure and behavior • Compilers “preserve meaning” but runtime reflection makes more information visible, so optimizations are more limited • Metadata (symbol table) or equivalent needed at runtime, not just compile/link time • Interactive Development Environments (IDEs) • Intellisense™ • Refactoring • Interactive syntax analysis • Query Integration • Builds expression trees (ASTs) at compile time • Runtime operations to combine and manipulate them • Figures of merit: • “Compiler” and “JIT compiler”: throughput • “Pre-JIT” compiler: balance of throughput and code quality “Advanced Compiler Techniques”
Runtime Reflection Source Code Metadata + Intermediate Language Front End DevelopmentEnvironment Image File OS Loader Metadata(symbol table) DynamicLinker Back End Debugger Running Program “Advanced Compiler Techniques”
Compilers, Interpreters, VM… Pre-processor Parser Code Generator Assembler Program Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Source-to-Source Translator Pre-processor Parser Code Generator Assembler Program Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Interpreters Pre-processor Parser Code Generator Assembler Program Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Traditional Compilers Pre-processor Parser Code Generator Assembler Program Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Compilers & VM Pre-processor Parser Code Generator Assembler Program Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Compilers & VM with JITC Pre-processor Parser Code Generator Assembler Program Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Compilers, Interpreters, VM… Pre-processor Parser/ Disassembler Code Generator Assembler Machine Code Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Decompilation Pre-processor Parser/ Disassembler Code Generator Assembler Machine Code Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Simulator/Emulator Pre-processor Parser/ Disassembler Code Generator Assembler Machine Code Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Binary Rewriting/Translation Pre-processor Parser/ Disassembler Code Generator Assembler Machine Code Parse Tree/IR Assembly Code Machine Code Translator Interpreter Bytecode Generator JIT Compiler Program Bytecode Bytecode Interpreter “Advanced Compiler Techniques”
Isn’t it a solved problem? • Machines are constantly changing • Changes in architecture changes in compilers • new features pose new problems • changing costs lead to different concerns • old solutions need re-engineering • Innovations in compilers should prompt changes in architecture • New languages and features “Advanced Compiler Techniques”
What are important in a compiler? • Correct code • Output runs fast • Compiler runs fast • Compile time proportional to program size • Support for separate compilation • Good diagnostics for syntax errors • Works well with the debugger • Good diagnostics for flow anomalies • Cross language calls • Consistent, predictable optimization “Advanced Compiler Techniques”
Outline • Compilers, compilers, everywhere • Free-standing compilers • Under the hood • Inside applications • In the tool chain • Inside libraries • New Languages • Correctness & Security • Just-in-time & Run-time support “Advanced Compiler Techniques”
Special-Purpose Compilers • Compile-to-hardware • Aspect-Oriented Programming (AOP) weaver • Parser finds new syntax to mark insertion points • Back-end inserts code snippets for different aspects • More generally: “assembly rewriting” • Work-flow and object design languages • Input may be textual or graphic layouts • Output may be code or graphic designs “Advanced Compiler Techniques”
Mark-up Compilers • XML schema (or DTD: Document Type Des.) • Output: parser • Output: deserializer • Web-services Description (WSDL) • Output: proxy that parses input and dispatches • Output: code to convert data structure to XML (“serializer”) • XAML (Windows Presentation Framework) • Output: parser • Output: executable code • XSL “Advanced Compiler Techniques”
Outline • Compilers, compilers, everywhere • Free-standing compilers • Under the hood • Inside applications • In the tool chain • Inside libraries • New Languages • Correctness & Security • Just-in-time & Run-time support “Advanced Compiler Techniques”
Modern Hardware: CPU • Compile “machine code” to “micro code” • CPU Architecture is the abstraction boundary • RISC vs CISC is an old debate • x86 and x64 are CISC on the outside, RISC on the inside • Part of the instruction cache • Engineering note: an icache miss now often means a pause to compile in addition to a memory fetch! • Allows innovation in actual hardware while still running existing code • Chips optimized for specific usage scenarios • Chips take advantage of materials science advances • Chips take advantage of new internal architectures (multi-core) “Advanced Compiler Techniques”
Modern Hardware: Graphics • Graphics memory isn’t just for data • Very sophisticated compilation steps • Parallel execution with CPU • Adapts to changing hardware organization • Raster scan vs vector • Resolution, speed, synchronization • Adapts to predominant usage pattern • Animation • 3D • Shading “Advanced Compiler Techniques”
Outline • Compilers, compilers, everywhere • Free-standing compilers • Under the hood • Inside applications • In the tool chain • Inside libraries • New Languages • Correctness & Security • Just-in-time & Run-time support “Advanced Compiler Techniques”
Databases • SQL is a full programming language • Compiled to intermediate form on client • Intermediate form is passed to server for execution • Server optimizes the intermediate form to produce an “execution plan” • Query optimization • Additional inputs include • Size of tables • Frequency of query types • Indexing information • Outputs include • Executable code • Temporary indexes • Background indexing requests • Updated frequency information “Advanced Compiler Techniques”
Hardware Simulators/Emulators • Object code translation at runtime • HP3000 to PA-RISC in 1983 • Vax to Alpha in 1990s • 32-bit programs on 64-bit hardware • Alternate hardware emulation • Device emulators for everything from smart cards to cell phones to iPod to pocket PCs • JIT compilation trades start-up time for high performance execution • Often, but not always, a good trade-off “Advanced Compiler Techniques”
Code Analysis Tools • Analyzing API surface • Simple to do with front end ASTs • “Remodularizing” implementation • Requires static and dynamic dependency analysis – normal compiler back end work • Requires rebuilding the program, easily done using front end ASTs • Race detection • Instrument code at compile time • Gather data as it runs under high stress “Advanced Compiler Techniques”
“Tree Shakers” • Start with AST tree and appropriate dependency graph • Pull AST nodes found starting at a given graph node, recursively • Convert resulting set of AST nodes to appropriate output format • Example uses: • Subset library based on initial set of types • Statically link subset of library for a given application “Advanced Compiler Techniques”
Outline • Compilers, compilers, everywhere • Free-standing compilers • Under the hood • Inside applications • In the tool chain • Inside libraries • New Languages • Correctness & Security • Just-in-time & Run-time support “Advanced Compiler Techniques”
A Modern Interactive Development Environment (IDE) • Code editor • Knows the programming language, provides syntax support and context-sensitive name lookup • Project system • Tracks the public shape of components • Tracks dependencies between components • Build system • Orders clean-up, compile, and link operations • Debugger • Allows inspection and modification of values at runtime • Allows control operations (e.g., breakpoint, continue, restart) • Dynamic Support • Allows program modification interwoven with execution (“edit and continue”) • Global interaction space (“read-eval-print loop”) “Advanced Compiler Techniques”
Compilers in the IDE (I) • In the code editor • Incrementally parses the code as it is being entered. Note: must deal with incorrect syntax and partial programs. • Suggests possible completions based on a symbol table. Note: symbol table must include external references maintained by the project system. • Refactoring operations require both syntactic and semantic analysis. Note: refactoring requires information maintained by the project system. • In the debugger • Expression evaluation “Advanced Compiler Techniques”
Compilers In the IDE (II) • Dynamic support • Edit-and-continue • Requires a full, incremental compiler • For efficiency, it also requires the ability to compress the output as a “diff” between the original and the new code • Interactive workspace • Like LISP, APL, SmallTalk, Python, etc. • Requires • a compiler or • an interpreter -- really, a compiler front end to generate an AST combined with a tree walker to execute the tree. • The compiler must be capable of generating code that uses code and objects resident in the evaluation environment, which generally means a reliance on reflection. “Advanced Compiler Techniques”
Compilers in the Linker • The linker sees “the whole program”, so it’s better positioned to do global analysis • Solution: write a compiler • Input language is object file format (native code or IL) • Output language is OS image file format • Optimizations: • Aggressive in-lining across module boundaries • Code motion across module boundaries • Full type system analysis (treat leaf types as sealed) • Issues: • These flow graphs are *big* • The linker doesn’t see the whole program (dynamic linking) • Reflection and dynamic linking reduce permitted optimizations • Or require the ability to back out or recompute optimizations at runtime “Advanced Compiler Techniques”
Profile-Guided Optimization • Idea: Instrument the program, run it with typical loads, then re-optimize using this profiling data. (Similar to “Hotspot”) • Optimizations: • Optimize only “hot” code fragments • So you can spend more time on them • Method and basic block reordering to increase code density • Code reordering to optimize branch prediction and minimize “long” references • Cache locality optimizations for data and code “Advanced Compiler Techniques”
Outline • Compilers, compilers, everywhere • Free-standing compilers • Under the hood • Inside applications • In the tool chain • Inside libraries • New Languages • Correctness & Security • Just-in-time & Run-time support “Advanced Compiler Techniques”
For the Developer • “Regular expression” parsing • Grammar is usually more powerful than regular expressions • Serialization and Deserialization • Reflects on data type to be marshalled • Generates specialized code to convert to stream format (serialization) or parse into in-memory format (deserialization) “Advanced Compiler Techniques”
For the Compiler Writer • Parser-generators • lex • yacc • AST tool kits • Microsoft is investing in this area • Provides integration into may aspects of the IDE • Executable file format tool kits • Queensland University of Technology PERWAPI • Optimization tool kits • Microsoft’s Phoenix project “Advanced Compiler Techniques”
Libraries and Component Technology Interface: Provides/ Requires Code (source or binary) Data Description: Types, Sizes Traditional View Expanded View Performance: Device, Data Features Interface: Abstract Provides/ Requires Partial Code (source or tunable binary) Code Generator Interface: Device Dependencies Data Description: Types, Sizes Data Description: Map Features to Optimization Support for automatic selection, tuning, scheduling, etc. “Advanced Compiler Techniques”