1 / 92

Fall 2016-2017 Compiler Principles Lecture 5: Intermediate Representation

Fall 2016-2017 Compiler Principles Lecture 5: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev. Tentative syllabus. mid-term. exam. Previously. Becoming parsing ninjas Going from text to an A bstract S yntax T ree.

sburney
Download Presentation

Fall 2016-2017 Compiler Principles Lecture 5: Intermediate Representation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2016-2017 Compiler PrinciplesLecture 5: Intermediate Representation Roman Manevich Ben-Gurion University of the Negev

  2. Tentative syllabus mid-term exam

  3. Previously • Becoming parsing ninjas • Going from text to an Abstract Syntax Tree By Admiral Ham [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

  4. + num * num x From scanning to parsing 59 + (1257 * xPosition) program text Lexical Analyzer token stream Grammar:E id E num E E+EE  E*EE  ( E ) Parser valid syntaxerror Abstract Syntax Tree

  5. Agenda • The role of intermediate representations • Two example languages • A high-level language • An intermediate language • Lowering • Correctness • Formal meaning of programs

  6. Role of intermediate representation High-levelLanguage(scheme) LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep.(IR) CodeGeneration Executable Code Bridge between front-end and back-end Allow implementing optimizations independent of source language and executable (target) language

  7. Motivation for intermediate representation

  8. Intermediate representation Java Pentium C++ IR Java bytecode Pyhton Sparc • A language that is between the source language and the target language • Not specific to any source language or machine language • Goal 1: retargeting compiler components for different source languages/target machines

  9. Intermediate representation Lowering Code Gen. Java Pentium optimize C++ IR Java bytecode Pyhton Sparc • A language that is between the source language and the target language • Not specific to any source language or machine language • Goal 1: retargeting compiler components for different source languages/target machines • Goal 2: machine-independent optimizer • Narrow interface: small number of node types (instructions)

  10. Multiple IRs • Some optimizations require high-level constructs while others more appropriate on low-level code • Solution: use multiple IR stages Pentium optimize optimize Java bytecode AST LIR HIR Sparc

  11. Multiple IRs example Elixir – a language for parallel graph algorithms ElixirProgram+ delta ElixirProgram DeltaInferencer HIRLowering ILSynthesizer HIR LIR C++backend Query Answer PlanningProblem Plan C++ code AutomatedReasoning(Boogie+Z3) Automated Planner GaloisLibrary

  12. AST vs. LIR for imperative languages AST LIR An abstract machine language Very limited type system Only computation-related code Labels and conditional/ unconditional jumps, no looping Data movements, generic memory access statements No sub-expressions, logical as numeric, temporaries, constants, function calls – explicit argument passing • Rich set of language constructs • Rich type system • Declarations: types (classes, interfaces), functions, variables • Control flow statements: if-then-else, while-do, break-continue, switch, exceptions • Data statements: assignments, array access, field access • Expressions: variables, constants, arithmetic operators, logical operators, function calls

  13. three address code

  14. Three-Address Code IR Chapter 8 • A popular form of IR • High-level assembly where instructions have at most three operands • There exist other types of IR • For example, IR based on acyclic graphs – more amenable for analysis and optimizations

  15. Base language: While

  16. Syntax n Num numerals x Var program variables An | x | AArithOpA | (A) ArithOp - | + | * | / Btrue|false|A=A|AA|B|BB | (B) Sx:=A | skip | S;S | {S }| ifBthenSelseS | whileBS

  17. Example program while x < y { x := x + 1 { y := x;

  18. Intermediate language:IL

  19. Syntax n Num Numerals l Num Labels x Temp Var Temporaries and variables Vn | x RVOpV Op - | + | * | / |= | | << | >> | … Cl: skip | l: x:=R | l: Goto l’| l: IfZxGoto l’| l: IfNZxGoto l’ IR  C+

  20. Intermediate language programs 1: t0 := 137 2: y := t0 + 3 3: IfZ x Goto 7 4: t1 := y 5: z := t1 6: Goto 9 7: t2 := y 8: x := t2 9: skip An intermediate program P has the form1:c1…n:cn We can view it as a map from labels to individual commands and write P(j) = cj

  21. Lowering

  22. TAC generation • At this stage in compilation, we have • an AST • annotated with scope information • and annotated with type information • We generate TAC recursively (bottom-up) • First for expressions • Then for statements

  23. TAC generation for expressions • Define a function cgen(expr) that generates TAC that: • Computes an expression, • Stores it in a temporary variable, • Returns the name of that temporary • Define cgen directly for atomic expressions (constants, this, identifiers, etc.) • Define cgen recursively for compound expressions (binary operators, function calls, etc.)

  24. Translation rules for expressions cgen(n) = (l: t:=n, t) where l and t are fresh cgen(x) = (l: t:=x, t) where l and t are fresh cgen(e1) = (P1, t1) cgen(e2) = (P2, t2)cgen(e1ope2) = (P1· P2· l: t:=t1op t2, t) where l and t are fresh

  25. cgen for basic expressions cgen(k) = { // k is a constant c = c + 1, l = l +1emit(l: tc:= k) returntc { cgen(id) = { // id is an identifier c = c + 1, l = l +1emit(l: tc:= id) returntc { Maintain a counter for temporaries in c, and a counter for labels in l Initially: c = 0, l = 0

  26. Naive cgen for binary expressions The translation emits code to evaluate e1 before e2. Why is that? cgen(e1op e2) = { let A = cgen(e1) let B = cgen(e2)c = c + 1, l = l +1emit( l: tc := A op B; )returntc}

  27. Example: cgen for binary expressions cgen( (a*b)-d)

  28. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d)

  29. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = cgen(a*b)let B = cgen(d)c = c + 1 , l = l +1emit(l: tc := A - B; )returntc}

  30. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = cgen(a)let B = cgen(b)c = c + 1, l = l +1emit(l: tc := A * B; )returntc }let B = cgen(d)c = c + 1, l = l +1emit(l: tc := A - B; )returntc}

  31. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = {c=c+1, l=l+1, emit(l: tc := a;), returntc } let B = {c=c+1, l=l+1, emit(l: tc := b;), returntc }c = c + 1, l = l +1emit(l: tc := A * B; )returntc } let B = {c=c+1, l=l+1, emit(l: tc := d;), returntc }c = c + 1, l = l +1emit(l: tc := A - B; )returntc} Code here A=t1

  32. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = {c=c+1, l=l+1, emit(l: tc := a;), returntc }let B = {c=c+1, l=l+1, emit(l: tc := b;), returntc }c = c + 1, l = l +1emit(l: tc := A * B; )returntc } let B = {c=c+1, l=l+1, emit(l: tc := d;), returntc }c = c + 1, l = l +1emit(l: tc := A - B; )returntc} Code1: t1:=a; here A=t1

  33. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = {c=c+1, l=l+1, emit(l: tc := a;), returntc } let B = {c=c+1, l=l+1, emit(l: tc := b;), returntc }c = c + 1, l = l +1emit(l: tc := A * B; )returntc } let B = {c=c+1, l=l+1, emit(l: tc := d;), returntc }c = c + 1, l = l +1emit(l: tc := A - B; )returntc} Code1: t1:=a;2: t2:=b; here A=t1

  34. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = {c=c+1, l=l+1, emit(l: tc := a;), returntc } let B = {c=c+1, l=l+1, emit(l: tc := b;), returntc }c = c + 1, l = l +1emit(l: tc := A * B; )returntc } let B = {c=c+1, l=l+1, emit(l: tc := d;), returntc }c = c + 1, l = l +1emit(l: tc := A - B; )returntc} Code1: t1:=a;2: t2:=b;3: t3:=t1*t2 here A=t1

  35. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = {c=c+1, l=l+1, emit(l: tc := a;), returntc } let B = {c=c+1, l=l+1, emit(l: tc := b;), returntc }c = c + 1, l = l +1emit(l: tc := A * B; )returntc } let B = {c=c+1, l=l+1, emit(l: tc := d;), returntc }c = c + 1, l = l +1emit(l: tc := A - B; )returntc} here A=t3 Code1: t1:=a;2: t2:=b;3: t3:=t1*t2 here A=t1

  36. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = {c=c+1, l=l+1, emit(l: tc := a;), returntc } let B = {c=c+1, l=l+1, emit(l: tc := b;), returntc }c = c + 1, l = l +1emit(l: tc := A * B; )returntc } let B = {c=c+1, l=l+1, emit(l: tc := d;), returntc }c = c + 1, l = l +1emit(l: tc := A - B; )returntc} here A=t3 Code1: t1:=a;2: t2:=b;3: t3:=t1*t24: t4:=d here A=t1

  37. Example: cgen for binary expressions c = 0, l = 0 cgen( (a*b)-d) = {let A = {let A = {c=c+1, l=l+1, emit(l: tc := a;), returntc } let B = {c=c+1, l=l+1, emit(l: tc := b;), returntc }c = c + 1, l = l +1emit(l: tc := A * B; )returntc } let B = {c=c+1, l=l+1, emit(l: tc := d;), returntc }c = c + 1, l = l +1emit(l: tc := A - B; )returntc} here A=t3 Code1: t1:=a;2: t2:=b;3: t3:=t1*t24: t4:=d 5: t5:=t3-t4 here A=t1

  38. cgen for statements • We can extend the cgen function to operate over statements as well • Unlike cgen for expressions, cgen for statements does not return the name of a temporary holding a value • (Why?)

  39. Syntax n Num numerals x Var program variables An | x | AArithOpA | (A) ArithOp - | + | * | / Btrue|false|A=A|AA|B|BB | (B) Sx:=A | skip | S;S | {S }| ifBthenSelseS | whileBS

  40. Translation rules for statements cgen(skip) = l: skip where l is fresh cgen(e) = (P, t)cgen(x:=e) = P · l: x :=t where l is fresh cgen(S1) = P1, cgen(S2) = P2cgen(S1;S2) = P1· P2

  41. Translation rules for conditions cgen(b) = (Pb, t), cgen(S1) = P1, cgen(S2) = P2cgen(ifbthenS1elseS2) = PbIfZ t GotolfalseP1lfinish: GotoLafterlfalse: skipP2lafter: skip where lfinish, lfalse, lafter are fresh

  42. Translation rule for loops cgen(b) = (Pb, t), cgen(S) = Pcgen(whilebS) = lbefore: skipPbIfZtGotolafterPlloop: GotoLbeforelafter: skip where lafter, lbefore, lloop are fresh

  43. Translation example y := 137+3;if x=0 z := y;else x := y; 1: t1 := 1372: t2 := 33: t3 := t1 + t2 4: y := t35: t4 := x 6: t5 := 0 7: t6 := t4=t5 8: IfZ t6 Goto 12 9: t7 := y 10: z := t7 11: Goto 14 12: t8 := y 13: x := t8 14: skip

  44. Translation Correctness

  45. Compiler correctness • Compilers translate programs in one language (usually high) to another language (usually lower) such that they are both equivalent • Our goal is to formally define the meaning of this equivalence • First, we must formally define the meaning of programs

  46. Formal semantics

  47. http://www.daimi.au.dk/~bra8130/Wiley_book/wiley.html

  48. What is formal semantics? “Formal semantics is concerned with rigorously specifying the meaning, or behavior,of programs, pieces of hardware, etc.”

  49. Why formal semantics? • Implementation-independent definitionof a programming language • Automatically generating interpreters(and some day maybe full fledged compilers) • Optimization, verification, and debugging • If you don’t know what it does, how do you know its correct/incorrect? • How do you know whether a given optimization is correct?

  50. Operational semantics Elements of the semantics States/configurations: the (aggregate) values that a program computes during execution Transition rules: how the program advances from one configuration to another

More Related