740 likes | 900 Views
Theory of Compilation 236360 Erez Petrank. Lecture 6: Intermediate Representation. Source text . txt. Executable code. exe. You are here. Compiler. Lexical Analysis. Syntax Analysis. Semantic Analysis. Inter. Rep . ( IR). Code Gen. characters. tokens. AST
E N D
Theory of Compilation 236360ErezPetrank Lecture 6: Intermediate Representation
Source text txt Executable code exe You are here Compiler LexicalAnalysis Syntax Analysis SemanticAnalysis Inter.Rep.(IR) Code Gen. characters tokens AST (Abstract Syntax Tree) Annotated AST
Intermediate Representation • “neutral” representation between the front-end and the back-end • Abstracts away details of the source language • Abstract away details of the target language • Allows combination of various back-ends with various front-ends. GENERIC … C Ada Java Objective C … x86 ia64 arm
Intermediate Representation(s) • Annotated abstract syntax tree • Three address code • Postfix notation • … • Sometimes we move between representations. E.g., • Syntax directed translation produces an annotated AST, • AST translated to three address code.
Examples:Tree, DAG and Postfix • Input: a := b * –c + b * –c • Tree: DAG representation: ייצו postfix representation: a b c uminus * b c uminus * + assign
Example: Annotated AST • makeNode – creates new node for unary/binary operator • makeLeaf – creates a leaf • id.place – pointer to symbol table
* * id id b b uminus uminus id id c c Memory Representation of an Annotated Tree a = b * -c + b* -c assign id a +
Three Address Code (3AC) • Every instruction operates on three addresses • result = operand1 operator operand2 • Close to low-level operations in the machine language • Operator is a basic operation • Statements in the source language may be mapped to multiple instructions in three address code
t1 := – c t2 := b * t1 t3 := – c t4 := b * t3 t5 := t2 +t4 a := t5 Three address code: example assign + a * * uminus uminus b b c c
Array operations • Are these 3AC operations? x := y[i] t1 := &y ; t1 = address-of y t2 := t1 + i ; t2 = address of y[i] X := *t2 ; value stored at y[i] x[i] := y t1 := &x ; t1 = address-of x t2 := t1 + i ; t2 = address of x[i] *t2:= y ; store through pointer
Three address code: example • int main(void) { • int i; • int b[10]; • for (i = 0; i < 10; ++i) • b[i] = i*i; • } i := 0 ; assignment L1: if i >= 10 goto L2 ; conditional jump t0 := i*i t1 := &b ; address-of operation t2 := t1 + i ; t2 holds the address of b[i] *t2 := t0 ; store through pointer i := i + 1 goto L1 L2: (example source: wikipedia)
Three address code • Choice of instructions and operators affects code generation and optimization • Small set of instructions • Easy to generate machine code • Harder to optimize • Large set of instructions • Harder to generate machine code • Typically prefer small set and smart optimizer
Creating 3AC • Assume bottom up parser • Covers a wider range of grammars • LALR sufficient to cover most programming languages • Creating 3AC via syntax directed translation • Attributes examples: • code – code generated for a nonterminal • var – name of variable that stores result of nonterminal • freshVar() – helper function that returns the name of a fresh variable
Creating 3AC: expressions (we use || to denote concatenation of intermediate code fragments)
a = b * -c + b* -c example E.code=‘a = t1 t1 = -c t2 = b*t1 t3 = -ct4 = b*t3 t5 = t2+t4’ E.var = t5 E.code =‘t1 = -c t2 = b*t1 t3 = -c t4 = b*t3 t5 = t2+t4’ assign + a E.var = a E.code=‘’ E.var = t4 E.code =‘t3 = -c t4 = b*t3’ E.var = t2 E.code =‘t1 = -c t2 = b*t1’ * * E.var = t3 E.code =‘t3 = -c’ uminus E.var = t1 E.code =‘t1 = -c’ uminus b b E.var = b E.code=‘’ E.var = b E.code=‘’ c c E.var = c E.code =‘’ E.var = c E.code =‘’
Creating 3AC: control statements • 3AC only supports conditional/unconditional jumps • Add labels • Attributes • begin – label marks beginning of code • after – label marks end of code • Helper function freshLabel() allocates a new fresh label
Creating 3AC: control statements S while E do S1
op arg 1 arg 2 result (0) uminus c t1 (1) * b t1 t2 (2) uminus c t3 (3) * b t3 t4 (4) + t2 t4 t5 (5) := t5 a Representing 3AC • Quadruple (op,arg1,arg2,result) • Result of every instruction is written into a new temporary variable • Generates many variable names • Can move code fragments without complicated renaming • Alternative representations may be more compact • t1 = - c • t2 = b * t1 • t3 = - c • t4 = b * t3 • t5 = t2 * t4 • a = t5
↑ ↑ op arg 1 arg 2 Pointers to symbol table or to line number (0) uminus c (1) * b (0) (2) uminus c (3) * b (2) (4) + (1) (3) (5) assign a (4) A More Compact Representation • Triplets:op, arg1, arg2(result implicit in line number) • No Need for result, but can’t move code. • Bad for optimizations… • A possible solution: use this representation plus a separate execution order list: (10, 11, 0, 1, 15, 12, 0, 1, …)
Allocating Memory • Type checking helped us guarantee correctness • Also useful for determining size of allocation, determining field locations, and addresses inside arrays. Other variables prior to employee Offset for variable employee Method activation record employee Location of employee ...
Declarations and Memory Allocation • Global variable “offset” with memory allocated so far
Allocating Memory P enter(money, float, 4) offset = offset + 4 enter(count, int, 0) offset = offset + 4 D6 D4 id.name = money D1 D5 D2 id T1 id T2 T3 id int count float money ] balances num [ T4 id.name = count T1.type = int T1.width = 4 T2.type = float T2.width = 4 int 42
Adjusting to bottom-up • On a top-down parsing, we can zero the offset initially. But for bottom-up, we don’t know what the first instruction is… • A standard trick: add a marker and a rule that will always be first. P M D Є
Allocating Memory offset = 0 P M enter(count, int, 0) offset = offset + 4 D6 D4 D5 D1 D2 T3 id id T1 id T2 ] balances num [ T4 int count float money id.name = count int 42 T1.type = int T1.width = 4
Generating IR code • Option 1 accumulate code in AST attributes • Option 2emit IR code to a file during compilation • If for every production rule the code of the left-hand-side is constructed from a concatenation of the code of the RHS in some fixed order • Does not allow code manipulations after emitting.
Expressions and assignments • Lookup returns the variable location in memory. • Allocate space for a variable at first sight.
Boolean Expressions • Represent true as 1, false as 0 • Wasteful representation, creating variables for true/false
Boolean expressions via jumps This is useful for short circuit evaluation. Let us start with an example.
Boolean Expressions: an Example E E or E E and E a < b c < d e < f
Boolean Expressions: an Example E E or E E and E a < b c < d e < f
Boolean Expressions: an Example E E or E E and E a < b c < d e < f
Boolean Expressions: an Example E E or E E and E a < b c < d e < f
Boolean Expressions: an Example E E or E E and E a < b c < d e < f • T4 := T2 and T3
Boolean Expressions: an Example E E or E E and E a < b c < d e < f • T4 := T2 and T3 • T5 := T1 or T4
Short circuit evaluation • Second argument of a boolean operator is only evaluated if the first argument does not already determine the outcome • (x and y) is equivalent to: “if x then y else false”; • (x or y) is equivalent to: “if x then true else y”; • Is short circuit evaluation equivalent to standard evaluation? • We use it if the language definition dictates its use.
Note Difference intdenom = 0; if (denom && nom/denom) { oops_i_just_divided_by_zero(); } if ( (file=open(“c:\grades”) or (die) ) printfile(file,…);
Our previous example a < b or (c<d and e<f) 100: if a < b goto 103 101: T1 := 0 102: goto 104 103: T1 := 1 104: if c < d goto 107 105: T2 := 0 106: goto 108 107: T2 := 1 108: if e < f goto 111 109: T3 := 0 110: goto 112 111: T3 := 1 112: T4 := T2 and T3 113: T5 := T1 and T4 100: if a < b goto 105 101: if !(c < d) goto 103 102: if e < f goto 105 103: T := 0 104: goto 106 105: T := 1 106: naive Short circuit evaluation
Control Structure:if, else, while. • Consider the conditional jumps: • One option is to create code for B, create code for S, and then create a jump to the beginning of S or the end of S according to B’s value. • But it would be more efficient to create code that while executing B will jump to the correct location immediately when the value of B is discovered.
Control Structure:if, else, while. • The problem: we do not know where to jump to… • While parsing B’s tree for “if B then S”, we don’t know S’s code and where it starts or ends. • Solution: for each expression B keep labels: B.trueLabel and B.falseLabel. Jump to these labels when according to B’s value. • Also, for each statement S, keep a label S.nextLabelwith the address of the code that follows the statement S. • The semantic equation:B.falseLabel= S.next. • For B.true, we create a new label between B’s code and S’s code. S if B then S
התכונהnext • While parsing statement S, generate the code with its next label. • The attribute S.nextis inherited: the child gets it. • The attribute code is generated: the parent gets it. • The label S.next is symbolic. The address will only be known after we parse S.
→ to B.true Code for B with jumps → to B.false B.true: Code for S B.false: . . . Control Structures: conditional • “gen” creates the instruction and returns it as a string. (Unlike emit.) • Are S1.next, B.falseLabel inherited or synthesized? • Is S.code inherited or synthesized?
Control Structures: conditional → to B.trueLabel • B.trueLabel and B.falseLabel considered inherited → to B.falseLabel
S if B then S1 else S2 B.code B.trueLabel: S1.code gotoS.next B.falseLabel: S2.code S.next: … B.trueLabel = freshLabel(); B.falseLabel = freshLabel(); S1.next = S.next; S2.next = S.next; S.code = B.code || gen(B.trueLabel ‘:’) || S1.code || gen(‘goto’ S.next) || gen(B.falseLabel ‘:’) || S2.code
Boolean expressions Generate code that jumps to B.trueLabel if B’s value is true and to B.falseLabel if B’s value is false.
Boolean expressions • How can we determine the address of B1.falseLabel? • Only possible after we know the code of B1 and all the code preceding B1
Example S if B then S1 B1.trueLabel = freshLabel(); B1.falseLabel = B.falseLabel; B2.trueLabel = B.trueLabel; B2.falseLabel = B.falseLabel; B.code = B1.code || gen (B1.trueLabel ‘:’) || B2.code B1 and B2 true false B.code= gen(‘goto’ B2.falseLabel) B.code= gen(‘goto’ B1.trueLabel)
Example B.trueLabel = freshLabel(); B.falseLabel = S.next; S1.next = S.next; S.code = B.code || gen (B.trueLabel ‘:’) || S1.code S if B then S1 B1.trueLabel = freshLabel(); B1.falseLabel = B.falseLabel; B2.trueLabel = B.trueLabel; B2.falseLabel = B.falseLabel; B.code = B1.code || gen (B1.trueLabel ‘:’) || B2.code B1 and B2 true false B.code= gen(‘goto’ B2.falseLabel) B.code= gen(‘goto’ B1.trueLabel)
Computing the labels • We can build the code while parsing the tree bottom-up, leaving the jump targets vacant. • We can compute the values for the labels in a second traversal of the AST • Can we do it in a single pass?
So far… • Three address code. • Intermediate code generation is executed with parsing (via semantic actions). • Creating code for Boolean expressions and for control statements is more involved. • We typically use short circuit evaluation, value of expression is implicit in control location. • We need to compute the branching addresses. • Option 1: compute them in a second AST pass.