230 likes | 338 Views
bytecode interpreter. src-to-src translator. bytecode compiler. CS412/413. JIT. Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems. Administration.
E N D
bytecode interpreter src-to-src translator bytecode compiler CS412/413 JIT Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems
Administration • Project code must be turned in Friday even if you haven’t finished the programming assignment -- at minimum, turn in something that can compile programs. • If parts of program are incomplete, include explanation of design, and components that are working CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
A Compiler Lexical analysis source-to- source translator Parsing bytecode compiler front end HIR Optimization “Compiler Classic” Intermediate Code Generation MIR Optimization Code generation JIT compiler back end LIR optimization interpreter Linking & Loading CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Other options • Compile from source to another high-level language (source-to-source translator), let other compiler deal with code gen, etc. • Compile from source to an intermediate code format for which a back end already exists (ucode, RTF, LCC, ...) • Compile from source to an intermediate code format executable by an interpreter • abstract syntax tree • bytecodes • threaded code (call- or pointer-threaded) • Advantage: Architecture independence CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Source-to-source translator • Idea: choose well-supported high-level language (e.g. C, C++, Java) as target • Translate AST to high-level language constructs instead of to IR, pass translated code off to underlying compiler • Advantage: easy, can leverage good underlying compiler technology. Examples: C++ (to C), PolyJ (to Java), Toba (JVM to C) • Disadvantages: target language won’t support all features, optimization harder in target language, language may impose extra checks CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Compiling to C • C doesn’t impose extra checks, is reasonably close to assembly (but can’t support static exception tables!), widely available • Mismatch: doesn’t allow statements underneath expressions; need to translate and canonicalize in one step • Translation of an expression into C is: • a sequence of statements to be executed • an expression to be evaluated afterward E S1;…;Sn , E’ CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translation rules • Translation still can be performed by recursive traversal of AST • IotaC rules: E S , E’ id = E; S , id = E’; Si Si’, Ei’ { S1;…; Sn } S1’;…; Sn’ , En’ assignment block CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translating to Java • Same problems as C, plus: Java is type-safe (good in a HLL, not so good in an intermediate language!) • May need to use cast and instanceof expressions in generated code: slow in Java CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Back Ends • Several standard intermediate code formats exist with back ends for various architectures -- can reuse back ends • p-code: very old stack machine format • UCODE: old Stanford/MIPS stack machine format • Java bytecode: new stack machine format • RTL: GNU gcc, etc. • SUIF: Stanford format for optimization • LCC: Lightweight C compiler CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Intermediate code formats • Quadruples • compact, similar to machine code, good for standard optimization techniques • Stack-machine • easy to generate code for • not compact (contrary to popular opinion) • can be converted back into quadruples • used by some (sort of) high-level languages: FORTH, PostScript, HP calculators CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Stack machine format • Code is a sequence of stack operations (not necessarily the same stack as the call stack) push CONST: add CONST to the top of stack pop : discard top of stack store : in memory location specified by top of stack store element just below. load : replace top of stack with memory location it points to +, *, /, -, … : replace top two elems w/ result of operation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Generating code • Stack operations mostly don’t name operands (implicit): can code in 1 byte • Expression is translated to code that leaves its value on the top of stack • Translation of E1 + E2: E1 + E2 E1; E2; + • Translation of id = E : id = E E ; push addr(id); store • Good for very simple code generation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Verbosity • Values get “trapped” down low on stack especially with subexpression elim. • Need instruction that re-pushes element at known stack index on top; instruction is used often • Might as well have abstract register operands! • Result: less compact than a register-based format; extra copies of data too CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Stack machine register machine • At each point in code, keep track of stack depth (if possible) • Assign temporaries according to depth • Replace stack operands with quadruples using these temporaries 0 1 2 push a ; 0 push b ; 1 push c ; 2 * ; 1 + ; 0 t0 = a t1 = b t2 = c t1 = t1 * t2 t0 = t0 + t1 a b c a b*c a + b*c a+b*c CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
JIT compilers • Particularly widely available back end with well-defined intermediate code (JVM bytecode) • Generate code by reconstructing registers as discussed • Compilation is done on-the-fly: generating code quickly is essential generated code quality is usually low • HotSpot: new Sun JIT. High-quality optimization (esp. inlining and specialization), but used sparingly CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Interpreters • “Why generate machine code at all? Just run it. Processors are really fast” • Options: • token interpreters (parsing on the fly) -- reallly slow (>1000x) • AST interpreters -- 300x • threaded interpreters -- 20-50x • bytecode interpreters -- 10-30x CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
AST interpreters “Yet another recursive traversal” • For every node type in AST, add method Object evaluate(RunTimeContext r) • Evaluate method is implemented recursively Object PlusNode.evaluate(r) { return left.evaluate(r).plus(right.evaluate(r)); } • Variables, etc. looked up in r; some help from AST yields big speed-ups (e.g. pre-computed variable locations) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Threaded interpreters • Code is represented as a bunch of blocks of memory connected by pointers. Blocks are either code blocks or pointer blocks. • Execution is (mostly) depth-first traversal • Each block starts with a single jump instruction JMP JMP next machine code RET CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Threaded interpreters • Can call blocks directly regardless of kind • Conditional branches, other control structure embedded in special machine code blocks that munge stack • Interpreter proper is ~10 instructions of code: very space-efficient! • Long favored by FORTH fans: whole language runtime fits in 4K memory CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Call-threaded interpreters • Instead of pointers in pointer blocks, store CALL instructions • Depth-first traversal done by processor automatically • Faster, trivial code generation required machine code CALL CALL CALL CALL RET CALL RET RET CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Bytecode interpreters • Problem with threaded interpreters: operations with immediate arguments require multiple words (code bloat) • Observation: most interpreter instructions can be encoded in less than a word, many even in just a byte (esp. with stack-machine model) • Tradeoff: interpreter size vs. executable code size CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Implementing a bytecode interpreter • Bytecode interpreter simulates a simple architecture (either stack or register machine) • Interpreter state: • current code pointer • current simulated return stack • current registers or stack & stack pointer • Interpreter code is a big loop containing a switch over kinds of bytecode instructions • one big function: optimization techniques work well. Ideally: no recursion on function calls • Result: about 10x slowdown if done right CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Summary • Building a new system for executing code doesn’t require construction of a full compiler • Cost-effective strategies: source-to-source translation or translation to an existing intermediate code format • Most material covered in this course is still helpful for these approaches • High performance: translate to C • Portability, extensibility: translate to Java or JVM CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers