540 likes | 783 Views
Intermediate Representations (Chapter 4). Outline. Issues in IR design High-Level IRs Medium-Level IRs Low-Level IRs Multi-Level IRs MIR, HIR, and LIR ICAN (Informal Compiler Algorithm Notation) Representations Other IRs Conclusions. Issues in IR Design. Portability
E N D
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN(Informal Compiler Algorithm Notation) Representations • Other IRs • Conclusions
Issues in IR Design • Portability • Optimization level • Complexity of the compiler • Reuse of legacy compiler parts • Compilation cost • Multi vs. One IR levels • Compiler maintenance
ExampleMIPS Compiler UCODE Stack Based IR Load/Store Based Architecture
Translator Optimizer Medium Level IR Translator UCODE Stack Based IR Code generator ExampleMIPS Compiler UCODE Stack Based IR Medium Level IR Load/Store Based Architecture
ExamplePA-RISC (HP-RISC) UCODE Stack Based IR Load/Store Based Architecture
ExamplePA-RISC (HP-RISC) UCODE Stack Based IR Translator Very low IR (SLLIC) Optimizer Very low IR (SLLIC) Code generator Load/Store Based Architecture
Why do we need multiple representations? • Lower representations expose more computations • more effective “standard” optimizations • examples: strength reduction, loop invariats, ... • Higher representations provide more “non-determinism” • more effective parallelization (reordering) • data cache optimizations
ExampleArrays LIR r1 [fp-4] r2 r1+2 r3 [fp-8] t4 r3*20 r5 r2+r4 r6 4*r5 r7 fp-216 f7 [r7+r6] MIR t1 j+2 t2 i*20 t3 t1+t2 t4 4*t3 t5 addr a t6 t5+t4 t7 *t6 C-code float a[20][10]; ... ... a[i][j+2] addr(a) +4 (i*20 + j +2) HIR t a[i, j+2]
ExternalRepresentation • Internal IR representation is used in the compiler • External representation is needed for: • Compiler debugging • Cross-module integration • Design issues • Representing pointers • Unique representation of temporaries • Compaction
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN Representations • Other IRs • Conclusions
Abstract Syntax Trees • Compact source representation • No punctuation symbols • Tree defines hierarchy • Used for Front-Ends • Sometimes include symbol table pointers • Can be translated into HIR • Can be also used for compaction
ident f indent a ident ident c b Example AST function body paramlist declist paramlist C-CODE int f(int a, int b) { int c; c = a + 2; print(c); } stmtList end ident end c stmtList = + call end const indent ident arglist a 2 print indent end c
Other HIRs • Normal linear forms: • Preserve control flow structures and arrays • Simplified control flow structures • Eliminate GOTOs • Continuations
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN Representations • Other IRs • Conclusions
Medium Level IR • Source and target language independent • Machine independent representation for program variables and temporaries • Simplified control flow constructs • Portable • Sufficient in many optimizing compilers: MIR, Sun-IR
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN Representations • Other IRs • Conclusions
Low Level IR • One to one correspondence with machine • Deviations from the machine • Alternative code • Addressing modes • Side effects? • Instruction selection in the last phase • Appropriate compiler data structure can hide dependence
Side Effect Operations(PA-RISC) MIR L1: t2 *t1 t1 t1+4 ... t3 t3+1 t5 t3 < t4 if t5 goto L1 PA-RISC (Option 1) LDWM 4(0, r2), r3 ... ADDI 1, r4, r4 COMB, < r4, r5, L1
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN Representations • Other IRs • Conclusions
Multi-Level Intermediate Representations • Multiple representations in the same language • Compromise computation exposure and high level description • SUN-IR: Arrays can be represented with multiple subscripts
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN Representations • Other IRs • Conclusions
Example C-code MIR make_node: begin receive p(val) receive n(val) q call malloc, (8, int) *q.next nil *q.value n *p.next q return end void make_node(p, n) struct node *p; int n; {struct node *q; q = malloc(sizeof(struct node)); q->next = nil; q->value=n; p->next = q; }
insert_node: begin receive n(val); receive l(val) t1 * l.value; if n <= t1 goto L1 t2 *l.next; if t2 != nil goto L2 call make_node, (l, type1; n, int) return L2: t4 *l.next call insert_node, (n, int, t4, type1) return L1: return end C-code void insert_node( n, l) int n; struct node *l; {if (n > l.value) if (l->next == nil) make_node(l, n); else insert_node(n, l->next); }
MIR Issues PA-RISC MIR • MIN does not usually exist • Both value and “location” computation for Boolean conditions t1 t2 min t3 MOVE r2, r1 COM, >= r3, r2 MOVE r3, r1 t3 t1<t2 if t3 goto L1 if t1 < t2 goto L1
HIR • Obtained from MIR • Extra constructs • Array references • High level constructs
MIR v opd1 t2 opd2 t3 opd3 if t2 > 0 goto L2 L1: if v < t3 goto L3 instructions; v v + t2 goto L1 L2: if v > t3 goto L3 instructions; v v + t2 goto L2 L3: HIR for v opd1 by opd2 to opd3 instructions endfor
insert_node: begin receive n(val); receive l(val) t1 * l.value if n > t1 then t2 *l.next; if t2 = nil then call make_node, (l, type1; n, int) return else t4 *l.next call insert_node, (n, int, t4, type1) return; fi; fi; end C-code void insert_node( n, l) int n; struct node *l; {if (n > l.value) if (l->next == nil) make_node(l, n); else insert_node(n, l->next); }
LIR • Obtained from MIR • Extra features: • Low level addressing • Load/Store • Eliminated constructs • Variables • Selectors • Parameters
insert_node:begin s800 s1; s801s2 s802[s801+0];if s800<=s802 goto L1 s803[s801+4];if s803!=nil goto L2 s1 s801;s2 s800 call make_node, ra return L2: s1s800; s2 [s801+4] call insert_node, ra return L1: return end C-code void insert_node( n, l) int n; struct node *l; {if (n > l.value) if (l->next == nil) make_node(l, n); else insert_node(n, l->next); }
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN Representations • Other IRs • Conclusions
Representing MIR in ICAN • An MIR program can be (internally) represented as an abstract syntax tree • The general construction • A (union) type for every non-terminal • An enumerated type “kind” for every production • A tuple for every production • Other ideas • Flatten the hierarchy in some cases • Use functions to abstract MIR properties(simplifies semantic manipulations)
ICAN Tuples for MIR Instruction (Table 4.7) Label: <kind:label, lbl:Label> receive VarName(ParamType) <kind:receive, left:VarName, ptype:ParamType> VarName Operand1 Binop Operand2 <kind:binasgn, left: varName, opr: Binop, opd1: Operand1, opd2:Operand2> VarName Unop Operand <kind:unasgn, left: VarName, opr: Unop, opd:Operand> VarName Operand <kind:valasgn, left: VarName, opd: Operand> ...
IRoper = enum{ add, || + sub, || - (unary) mul, || * (binary) div, || / mod, min, max, eql, neql, less, lseq, grtr, gteq, || =, !=, <, <=, >, >= shl, shr, shra, and, or, xor ind, || * pointer-dereference indelt, || *. dereference to a field neg, || - (unary) not, || ! addr, val, cast || (type cast) .. Table 4.6
MIRkind = enum {label, receive, binasgn, unasgn, ..., sequence} Opkind = enum { var, const, type} ExpKind = enum { binexp, unexp, noexp, listexp} Exp_Kind : MirKind ExpKind Has_Left: MirKind boolean Exp_Kind := {<label, noexp>, <receive, noexp>, <binassgn, binexp> <unasgn, unexp>, ... <callexp, listexp>, ... <sequence, noexp>} Has_Left := {<label, false>, <receive, true>, <binasgn, true>, <unasgn, true>, <valasgn, true>, <condasgn, true> <castasgn, true>, ...., <unif, false>, ...}
Inst: array[1..n] of Instructions Inst[1] =<kind: label, lbl:”L1”> Inst[2]=<kind:valasgn, left:”b”, opd:<kind:var, val:”a”>> Inst[3]=<kind:binasgn, left: “c”, opr: add, opd1: <kind: var, val: “b”>, opd2: <kind: const, val: “1”>> MIR L1: b a c b + 1
insert_node: begin receive n(val); receive l(val) t1 * l.value; if n <= t1 goto L1 t2 *l.next; if t2 != nil goto L2 call make_node, (l, type1; n, int) return L2: t4 *l.next call insert_node, (n, int, t4, type1) return L1: return end Fig 4.9
Representing HIR in ICAN • Similar to MIR (Table 4.8) • For statement has three expressions (Figure 4.10) • Break “if” and “for”
Representing LIR in ICAN • Similar to MIR (Table 4.9, 4.10) • No list expressions (Figure 4.11)
Example (4.12, 4.13) Inst[1] =<kind: label, lbl: “L1”> L1: r1 [r7+4] r2 [r7+8] r3 r 1 + r2 r4 -r3 if r3 > 0 goto L2 r5 (r9) r1 [r7-8](2) r5 L2: return r4 Inst[2] =<kind: loadmem, left: “r1”, addr:<kind:addrrc, reg: “r7”, disp:4, len:4>> Inst[3] =<kind: loadmem, left: “r2”, addr:<kind:addr2r, reg: “r7”, reg2: “r8”, len:4>>
HIR, MIR, LIR as an ADT • View IR as an abstract data type • Example fields: • ProcName - the procedure name • Nblocks - the number of basic blocks • ninsts: array[1.. nblocks] of integer • Block: array[1..nblocks] of array [..] of Instruction • Succ, Pred: Integer set of integer • Example methods • insert_before(i, j, ninsts, Block, inst)
Outline • Issues in IR design • High-Level IRs • Medium-Level IRs • Low-Level IRs • Multi-Level IRs • MIR, HIR, and LIR • ICAN Representations • Other IRs • Conclusions
Triples • Three address instructions • Implicit names for results (instruction index) • No need for temporary names • Usually represented via pointers • Program transformations may be tricky • Can be translated from/into MIR
MIR TRIPLES L1: i i+ 1 t1 i +1 t2 p+4 t3 *t2 p t2 t4 t1 <10 *r t3 if t4 goto L1 (1) i+ 1 (2) i sto (1) (3) i +1 (4) p+4 (5) (*4) (6) p sto (4) (7) (3) <10 (8) r *sto (5) if (7), (1)
i: add i add i 1 1 i Trees • Compact representation for expressions • A basic block is a sequence of trees • Assignments can be implicit or explicit
MIR Trees L1: i i+ 1 t1 i +1 t2 p+4 t3 *t2 p t2 t4 t1 <10 *r t3 if t4 goto L1
Combining trees may lead to incorrect computation b: add a a+1 b a+a a: add a: add a a 1 1
t4: less t4: less t5: add 10 add t5 10 1 i i 1 Preorder Translation into MIR t4: less t5 i+1 t4 t5<10
Advantages of Trees • Minimize temporaries • Amenable to many optimizations • Locally optimized code with register allocation can be used • Easy to translate into Polish-Prefix code(used for automatic instruction selection)
Directed Acyclic Graphs (DAGs) • A combination of trees • Operands which are reused are linked • Nodes may be annotated with variable names