280 likes | 295 Views
Explore challenges and solutions in staged compilation to generate fast, verifiable, and secure code for various architectures while maintaining portability and flexibility in language support. Understand the benefits and drawbacks of abstract intermediate representations.
E N D
Modern Compiler Internal Representations Silvius Rus 1/23/2002
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Traditional Compiler Organization • Pass: output type • Read code as text: ASCII characters • Lexical scanner: language words • Syntactic parser: language phrases • Translation: attribute grammar phrases • Output generated code: binary stream • Focus on pipelining due to memory window constraints
Traditional Compiler Internal Representation • Grammatical structure not always built explicitly • Implicit, built-in semantics • Simple data structures: • Transition tables • Token streams and stacks
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Compiler Challenges • Versatile: • Understand multiple languages • Generate output for various architectures • Generated efficient code: • Fast: as fast as coded directly in the output language • Portable: runs on multiple platforms • Verifiable: runs provably within a specified class of behavior • Secure: provably respects certain security requirements • Extendable: need to extend in order to: • Incorporate new input language and/or target system • Take advantage of advances in run-time environments (such as ISA changes, multithreading, distributed/parallel execution) L+A < L*A
Understand Multiple Languages - Output for Multiple Targets • Abstract IR: • Same representation for Fortran, C, C++, Java, … • Possible only for conceptually similar languages • Good points: • Perform complex transformations on a single representation • Bad points: • Language semantics may either get lost or need additional particular representation • Specific architecture characteristics are more profitable to use than common (abstractable) ones
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Staged Compilation • Stage 1: • Load source file (text) into IR1 – machine independent • Optimize IR1 • Stream IR1 to text file • Save/reload, pipe, HTTP, … text file • SUIF files, Java bytecode, .NET assembly • Stage 2: • Load text file into IR2 – machine dependent • Perform machine specific optimization on IR2 • Generate executable code or interpret IR2
Staged Compilation • Prepare IR1 so that stage 2 is very cheap • Quicksilver • Insert templated optimized object code in bytecode • Pack speculative optimization validation predicates in bytecode • Keep method dependence graphs explicitly in bytecode • Microsoft .NET • Explicit type/class information in IL • Preformatted, quickly accessible metadata • Strings, tables, heaps • Custom data • Allow embedding of native code
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Generate Fast And Portable Code • Fast code • IR close to machine structure • Mapping data to registers • Mapping operations to opcodes • Scheduling instructions for superscalar/VLIW processors • Portable code • Machine description must be totally abstracted • QuickSilver: templated optimized code
Generate Verifiable Code • Microsoft .NET IL • Static and dynamic type safety - reflections • Managed code • Carries a minimum of information on itself • Usually signed by compiler in Stage 1 • Managed data • Only accessible from managed code • Garbage collected • Managed pointers
Generate Secure Code • Hard to define limits • Make sure you run what you mean to • Limit rights • Per user • Per software component • QuickSilver: digests • .NET IL: • Code is signed using encrypting of hashed original • Permissions are set per module
Generate Efficient Code • IR may also provide support for: • Versioning (Quicksilver, .NET) • Culture (.NET)
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Compiler Internal Representation - General Organization • High-level - completely machine independent • Abstract Syntax Tree • Control Flow Graph • Control Dependence Graph • Data Dependence Graph • Static Single Assignment • Medium-level - dependent on classes of machines • Virtual machine code, such as stack machine • Low level - dependent on particular ISA • Assembly, machine instruction graphs
Case Study: Polaris • High level representation • Abstract Syntax Tree • Control Flow Graph • Control Dependence Graph • Data Dependence Graph • Gated Static Single Assignment • Some generality • Backends for various parallel execution systems
Case Study: SUIF2 • Multiple level representation • CFG, CDG, … • Quads • Machsuif • Custom annotations • Multiple frontends: Fortran, C, Java • Multiple backends: SUIF VM, C, assembly • Decoupled passes communicate only via SUIF • Extendable: OSUIF
Case Study: Promis • Switch to Promis organization presentation • Switch to Promis IR presentation
Case Study: KCC • Kook and Associates (KAI) C++ compiler: • C++ dedicated internal representation • Advanced C++ specific optimization • Proprietary C++ specific object format • Interprocedural optimization with modular compilation • C++ specific debug information – usable with KDB • Outputs C with calls to proprietary run-time library • Uses GNU gcc to generate machine code
Case Study: Jalapeno QuickSilver • Quasi-static images • Java bytecode + proprietary format • Representation allows for optimizations • Explicit method dependence graph • Templated optimized object code • Speculative optimization validation predicates
Case Study: .NET • Advertised 9 digit $$ figure project • CLI (ECMA standard) • Common type system • Type info in intermediate code • Common exception system • Throw in Visual Basic, catch in C++ • Support for security, culture, versioning • Support for charging per-use • Custom info can be passed for original language specific description 30+ languages MSIL native code
Other Compilers – Open Source • GNU compiler: • C, Fortran, Java, C++ front-ends • Generates code for all major architectures • Low level internal representation • New version (3.x) has SSA • SGI open source project: discontinued
Other Compilers – Commercial • Fortran, C, C++, Java produced by OS and/or hardware producers • HP, SGI, Intel, Microsoft, SUN • Other commercial compiler producers: • Borland, Watcom, etc. • Internal representation – company secret
Presentation Navigator • Introduction • Challenges • Staged compilation • Generate efficient code • Case studies • Conclusions
Conclusions • Internal representation evolved • Programming paradigms • Changes in hardware • Changes in compiler/run-time system technology • New issues: security, verifiability, culture, versioning • Tendency: E Pluribus Unum