Understanding Intermediate Representation in Compiler Design

Intermediate representation • Goals: • encode knowledge about the program • facilitate analysis • facilitate retargeting • facilitate optimization HIR semantic analysis HIR intermediate code gen. scanning parsing LIR LIR code gen. optim

Intermediate representation • Components • code representation • symbol table • analysis information • string table • Issues • Use an existing IR or design a new one? • How close should it be to the source/target?

IR selection • Using an existing IR • cost savings due to reuse • it must be expressive and appropriate for the compiler operations • Designing an IR • decide how close to machine code it should be • decide how expressive it should be • decide its structure • consider combining different kinds of IRs

IR classification: Level • High-level • closer to source language • used early in the process • usually converted to lower form later on • Example: AST

IR classification: Level • Medium-level • try to reflect the range of features in the source language in a language-independent way • most optimizations are performed at this level • algebraic simplification • copy propagation • dead-code elimination • common subexpression elimination • loop-invariant code motion • etc.

IR classification: Level • Low-level • very close to target-machine instructions • architecture dependent • useful for several optimizations • loop unrolling • branch scheduling • instruction/data prefetching • register allocation • etc.

IR classification: Level i := op1 if step < 0 goto L2 L1: if i > op2 goto L3 instructions i := i + step goto L1 L2: if i < op2 goto L3 instructions i := i + step goto L2 L3: for i := op1 to op2 step op3 instructions endfor High-level Medium-level

IR classification: Structure • Graphical • trees, graphs • not easy to rearrange • large structures • Linear • looks like pseudocode • easy to rearrange • Hybrid • combine graphical and linear IRs • Example: • low-level linear IR for basic blocks, and • graph to represent flow of control

(Basic blocks) • Basic block = a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end.

(Basic blocks) • Partitioning a sequence of statements into BBs • Determine leaders (first statements of BBs) • the first statement is a leader • the target of a conditional is a leader • a statement following a branch is a leader • For each leader, its basic block consists of the leader and all the statements up to but not including the next leader.

Linear IRs • Sequence of instructions that execute in order of appearance • Control flow is represented by conditional branches and jumps • Common representations • stack machine code • three-address code

Linear IRs • stack machine code • assumes presence of operand stack • useful for stack architectures, JVM • operations typically pop operands and push results. • advantages • easy code generation • compact form • disadvantages • difficult to rearrange • difficult to reuse expressions

Linear IRs • three-address code • compact • generates temp variables • level of abstraction may vary • loses syntactic structure • quadruples • operator • up to two operands • destination • triples • similar to quadruples but the results are not named explicitly (index of operation is implicit name) • Implement as table, array of pointers, or list

Linear IRs L1: i := 2 t1:= i+1 t2 := t1>0 if t2 goto L1 (1) 2 (2) i st (1) (3) i + 1 (4) (3) > 0 (5) if (4), (1) Quadruples Triples

Graphical IRs • Parse tree • Abstract syntax tree • high-level • useful for source-level information • retains syntactic structure • Common uses • source-to-source translation • semantic analysis • syntax-directed editors

Graphical IRs • Tree, for basic block • root: operator • up to two children: operands • can be combined • Uses: • algebraic simplifications • may generate locally optimal code. gt, t2 add, t1 0 L1: i := 2 t1:= i+1 t2 := t1>0 if t2 goto L1 assgn, i add, t1 gt, t2 assgn, i 1 2 i 1 t1 0 2

Graphical IRs • Directed acyclic graphs (DAGs) • Like compressed trees • leaves: variables, constants available on entry • internal nodes: operators • annotated with variable names? • distinct left/right children • Used for basic blocks (doesn't show control flow) • Can generate efficient code. • Note: DAGs encode common expressions • But difficult to transform • Better for analysis

Graphical IRs • Generating DAGs • check whether an operand is already present • if not, create a leaf for it • check whether there is a parent of the operand that represents the same operation • if not create one, then label the node representing the result with the name of the destination variable, and remove that label from all other nodes in the DAG.

Graphical IRs • Directed acyclic graphs (DAGs) • Examplem := 2 * y * z n := 3 * y * z p := 2 * y - z

Graphical IRs • Control flow graphs (CFGs) • Each node corresponds to a • basic block, or • fewer nodes • may need to determine facts at specific points within BB • a single statement • more space and time • Each edge represents flow of control

Graphical IRs • Dependence graphs • Encode flow of values from definition to use • Nodes represent operations • Edges connect definitions to uses • Graph represents constraints on the sequencing of operations • Built for specific optimizations, then discarded

SSA form • Static Single Assignment Form • Encodes information about data and control flow • Two constraints: • each definition has a unique name • each use refers to a single definition • all uses reached by a definition are renamed • Example:x := 5 x0 := 5 x := x+1 becomes x1 := x0 + 1 y := x *2 y0 := x1 * 2 • What if we have a loop?

SSA form • The compiler inserts special join functions (called -functions) at points where different control flow paths meet. • Example:read(x) read(x0)if (x>0) if (x0>0) y:=5 y0 := 5else becomes else y:=10 y1 := 10x := y y2 := (y0, y1) x1 := y2

SSA form • Example 2: x := 0 x0 := 0i := 1 i0 := 1while (i<10) if (i0>=10) goto L2 x := x+i L1: i := i+1

SSA form • Example 2: x := 0 x0 := 0i := 1 i0 := 1while (i<10) if (i0>=10) goto L2 x := x+i L1: x1:= (x0, x2) i := i+1 i1 := (i0, i2) x2 := x1+i1 i2 := i1+1 if (i2<10) goto L1 L2: x3 := (x0, x2) i3 := (i0, i2)

SSA form • Note:  is not an executable function • A program is in SSA form if • each variable is assigned a value in exactly one statement • each use of a variable is dominated by the definition. • point x dominates point y if every path from the start to y goes through x

SSA form • Why use SSA? • explicit def-use pairs • no write-after-read and write-after-write dependences • speeds up many dataflow optimizations • But • too many temp variables, -functions • limited to scalars • how to handle arrays?

Understanding Intermediate Representation in Compiler Design

Understanding Intermediate Representation in Compiler Design

Presentation Transcript

Representation

Intermediate representation

Intermediate Representation

Tutorial for LLVM Intermediate Representation

Representation

In favor of Isomorphic Intermediate Representation for Cross-Border eDiscovery

MCLinker Intermediate Representation

Representation

Representation

Representation

INSPIRE The Insieme Parallel Intermediate Representation

INSPIRE The Insieme Parallel Intermediate Representation

Representation

Bandera Intermediate Representation (BIR)

Representation

Semantic Analysis III + Intermediate Representation I

Tutorial for LLVM Intermediate Representation

Representation

Representation

Representation