250 likes | 364 Views
CS412/413. Introduction to Compilers and Translators Spring ’99 Lecture 13: Transforming Intermediate Code. Administration. Prelim 1 on Monday in class topics covered: regular expressions, tokenizing, context-free grammars, LL & LR parsers, static semantics No class Wednesday March 3
E N D
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 13: Transforming Intermediate Code
Administration • Prelim 1 on Monday in class • topics covered: regular expressions, tokenizing, context-free grammars, LL & LR parsers, static semantics • No class Wednesday March 3 • Programming Assignment 2 due Friday March 5 • Read: Appel 7, 8 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Where we are Source code (character stream) Lexical analysis regular expressions Token stream Syntactic Analysis grammars Abstract syntax tree Semantic Analysis static semantics Abstract syntax tree + types Intermediate Code Generation translation functions Intermediate Code CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Intermediate Code • Abstract machine code in tree form • Statements • MOVE, EXP, JUMP, CJUMP, SEQ, LABEL, RET • Expressions • CONST, TEMP, OP, MEM, CALL, ESEQ, LABEL • 13 kinds of tree nodes vs. hundreds of Pentium instructions—easier to generate, reason about CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Intermediate Representations • High-level IR (HIR) AST + extra node types • Medium-level IR (MIR) • intermediate between AST and assembly • other MIRs exist (quadruples, UCODE) • advantage of tree IR: easy to generate, easier to do reasonable instruction selection • Low-level IR (LIR) assembly code + extra pseudo-instructions CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
IR expressions • CONST(i) : the integer constant i • TEMP(t) : a temporary register t. The abstract machine has an infinite number of these • OP(e1, e2) : one of the following operations • PLUS, MINUS, MUL, DIV, MOD • AND, OR, XOR, LSHIFT, RSHIFT, ARSHIFT • MEM(e) : contents of memory locn w/ address e • CALL(f, l) : result of fcn f applied to arguments l • ESEQ(s, e) : result of e after stmt s is executed • NAME(n) : address of the statement labeled n CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
IR statements • MOVE(e, dest) : move result of e into dest • dest = TEMP(t) : assign to temporary t • dest = MEM(e) : assign to memory locn e • EXP(e) : evaluate e, discard result • SEQ(s1, s2) : execute s1 and then s2 • JUMP(e) : jump to address e • CJUMP(e, l1, l2) : jump to l1or l2depending on whether e is true or false • LABEL(n) : a labeled statement (may be used in NAME, JUMP, CJUMP) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translation • Intermediate code gen is tree translationAbstract syntax tree IR tree • Each subtree of AST translated to subtree in IR tree • Translation process described by translation function T [ E, A ] CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
location v : k A T [v] = MEM(PLUS(FP, CONST( k ))) fp 4 fp 8 Translation Example T [E1== E2 , A] = OP(==, T[E1, A], T[E2, A]) SEQ SEQ SEQ CJUMP LABEL(L1) == L2 L1 MEM if (b==0) a = b; CONST 0 LABEL(L2) + MOVE if fp 8 boolean int MEM MEM == = ; int b int 0 int a intb + + CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translation Code • Function T [ E, A] corresponds to a translation method class ASTnode IRnode translate(SymTab A); } • Note similarity to type-checking method: Type typeCheck(SymTab A); CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translating control structure • If, while, return statements cause transfer of control within program • Idea: Manage flow of control by introducing labels for statements, use CJUMP and JUMP statements to transfer control to the labels CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translating if CJUMP(T[E], t, f) t: T[S] f: T [ if (E) S ] = SEQ SEQ CJUMP T[E]NAME(t) NAME(f) SEQ LABEL(t) LABEL(f) T[ S ] = SEQ(CJUMP(T[E],NAME(t),NAME(f)), SEQ(LABEL(t), SEQ(T[S], LABEL(f)) (if t, f fresh) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
SEQ LABEL(loop) SEQ CJUMP SEQ T[ E ] NAME(t) NAME(f) LABEL(t) SEQ T[ S ] SEQ JUMP(NAME(loop)) LABEL(f) Translating while while (E) S loop: CJUMP (T[ E ], t, f) t: T[ S ] JUMP loop f: = SEQ(LABEL(loop), CJUMP, LABEL(t), T[S], JUMP(NAME(loop)), LABEL(f)) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Function calls, returns • Translate to corresponding IR node label id : lid A T[id ( E1,…En) , A] = CALL(lid, T[ E1], …, T[ En ]) T[ return E , A] = RET(T[E, A]) alternatively, = SEQ(MOVE(T[E ], RV), JUMP(NAME(end)) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Progress • Now have rules for transforming AST into intermediate representation • Can apply this to AST of each function defn to get IR for function • Intermediate representation has many features not found in real assembly code • arbitrarily deep expression trees vs. 1-2 deep • ability to perform statements with side-effects as part of an expression (ESEQ, CALL); undefined behavior • CJUMP is two-way jump rather than fall-through • Why do we allow this in IR at all? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Canonical form • Idea: rewrite trees to get rid of constructs incompatible with assembly • arbitrarily deep expression trees -- deal with this later as part of instruction tiling • ESEQ & CALL nodes -- push ESEQ nodes upward in tree until they become SEQ nodes, push all CALL nodes up, make top-level backbone of SEQ nodes. • CJUMP is two-way jump rather than fall-through -- rewrite so jump on false is always to the very next instruction CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Canonical form • In canonical form, all SEQ nodes go down right chain: • Function is just one big SEQ containing all statements: SEQ(s1,s2,s3,s4,s5,…) • Can translate to assembly more directly SEQ s1 SEQ s2 SEQ s3 SEQ s4 SEQ s5 ... CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Non-canonical features • ESEQ nodes put a statement node underneath an expression: int x = 1 + { while (y > 0) { … } z; } • CALL nodes have side effects; must move to top level as EXP(CALL(…)) or MOVE(CALL(…)) to define behavior ESEQ S E CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
ESEQ rewriting • Want to move ESEQ nodes up to top of tree where they can become SEQ nodes • Idea: define transformation rules that take an IR tree and move ESEQ nodes to top. • Goal: move side-effecting statements to top of tree without ripping apart expressions more than necessary -- leads to better code because expression patterns can be recognized and mapped to instruction set CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
ESEQ Transformations • Example transformations: ESEQ(s1, ESEQ(s2, e))Þ ESEQ(SEQ(s1, s2), e)) MOVE(ESEQ(s1, e), dest) Þ SEQ(s1, MOVE(e, dest)) OP(ESEQ(s1, e1), e2) Þ ESEQ(s1, OP(e1, e2)) OP(e1, ESEQ(s1, e2)) Þ ? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Rewriting expressions • OP(e1, ESEQ(s1, e2)) ESEQ ? OP e1 s1 OP ESEQ e1 e2 s1 e2 ? { a=0; e1 + e2 } e1 + { a=0; e2 } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Introducing temporaries • If e1 does not commute with s1 • i.e., {s1; e1; e2}¹{e1; s1; e2} • Must save value of e1 in temporary ESEQ OP OP e1 SEQ ESEQ s1 TEMP(t) e2 MOVE s1 e2 e1 TEMP(t) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
General case • When we move all ESEQ nodes to top, arbitrary expression node looks like: • ESEQ transformation takes arbitrary expression node, returns list of sub-statements to be executed plus final expression. • ESEQ node built as shown ESEQ expr SEQ SEQ s1 SEQ s2 ... s3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Interface class CanonicalExpr { IRStmt[] pre_stmts; IRExpr expr; } abstract class IRExpr { CanonicalExpr canonical( ); } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Conclusions • AST statements for structured control flow like “if” and “while” can be translated to unstructured IR nodes using JUMP, CJUMP, LABEL nodes. • Simple code transformations can transform the IR representation into a canonical form that has many of the properties of assembly code. CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers