610 likes | 632 Views
The Tiger compiler. Ivan Pribela. Contents. The Tiger language Course structure Course sequence Object-oriented Tiger language Functional Tiger language Existing course adaption to Tiger. Tiger compiler. The Tiger language. The Tiger language. Tiger programming language
E N D
The Tiger compiler Ivan Pribela
Contents • The Tiger language • Course structure • Course sequence • Object-oriented Tiger language • Functional Tiger language • Existing course adaption to Tiger
Tiger compiler The Tiger language
The Tiger language • Tiger programming language • is simple, but nontrivial language • belongs to Algol family with nested scope • heap-allocated records with implicit pointers • arrays, integer and string variables • few simple structured control constructs • Easily modified to • a functional programming language • be object-oriented
Sample Tiger programs let var a := 0 in for i := 0 to 100 do ( a := a + 1; () ) end let function do_nothing1(a: int, b: string): int = ( do_nothing2(a + 1); 0 ) function do_nothing2(d: int): string = ( do_nothing1(d, “str”); “ ” ) in do_nothing1(0, “str2”) end
Lexical issues • Identifiers • sequence of letters, digits and underscores, starting with letter • Comments • starting with /* and ending with */ • can apear betwean any two tokens • can be nested
decs→ { dec } dec→ tydec | vardec | fundec Declarations • Declaration sequence • a sequence of type, value and function declarations • no punctation separates or terminates individual declarations
tydec→ type id = ty ty→ id | { tyfld } | arrayof id tyfld→ λ| id : type-id {, id : type-id } Data Types • Built-in types • int and string • ca be redefined • Type equality • by name • Mutually recursive • consecutive sequence • list = {hd: int, tl: list} • Field name reusability
vardec→ var id := exp | var id [: type-id ] := exp Variables • Variable type • in short form, type of the expression is used • in long form, given type and type of expression must match • if expression is nil, long fom must be used • Variable lasts until end of scope
fundec→ function id ( tyfld ) = exp | function id ( tyfld ): type-id = exp Functions • Parameters • all parameters are passed by value • Mutually recursive • declared in consecutive sequence
Scope rules • Variables • let ... vardec ... in exp end • Parameters • function id ( ... id1 : id2 ... )= exp • Nested scopes • access to a variable in outer scopes is permited • Types • let ... typedec ... in exp end
Scope rules • Functions • let ... fundec ... in exp end • Name spaces • two name spaces (types, varables & functions) • Local redeclarations • object can be hidden in a smaller scope • mutually recurcive objects must have different names
fundec→ function id ( tyfld ) =exp | function id ( tyfld ): type-id = exp Values • L-values • location whose value may be read or assigned • variables, procedure parameters, fields of records and elements of the array
Expressions • L-value • evaluates to the location contents • Valueless expressions • procedure calls, assignment, if-then, while, break, and sometimes if-then-else • Nil • expression nil denotes a value nil • when used, it must have a type determined • Sequencing • sequence of expressions (exp1; exp2; ... expn)
Expressions • No value • empty sequence • let expression with empty in...end • Integer literal • sequence of digits • String literal • sequence of 0 or more printable characters betwean quotes • \\ \n \t \ddd \” \f...f\ • Negation • Function call • has value of function result, or produces no value
Operations • Arithmetic operators • + - * / • Comparison • = < > <= >= <> • produces: 0 for false, 1 for true • Boolean operators • & | • 0 is considered false, non zero is true
Records and arrays • Record creation • type-id { id = exp { , id = exp} } • Array creation • type-id [ exp1]of exp2 • Assignment and Extent • records and arrays assignmen is by reference • records and arrays have infinite extent
Statements • If-then-else • if exp1then exp2 [ else exp3 ] • exp2 and exp3 must be the same type • While loop • while exp1do exp2 • For loop • for id := exp1to exp2do exp3
Statements • Break • terminates evaluation of nearest while or for • Let • let desc in expseq end • evaluates desc • binds types variables and functions • result (if any) is the result of last expression • Parentheses
Standard library function print (s: string) function flush () function getchar (): string function ord (s: string): int function chr (i: int): string function not (i: int): int function exit (i: int) function size (s: string): int function substring ( s: string, first: int, n: int ): string function concat ( s1: string, s2: string ): string
Tiger compiler Course structure
Lectures • Lectures • Students will see the theory behind different components of a compiler • programming techniques used to put the theory into practice • and the interfaces used to modularize the compiler • written in Java programming language
Practical exercises • Practical exercises • The “student project compiler” is reasonably simple • is organized to demonstrate some important techniques • Use abstract syntax trees to avoid tangling syntax and semantics • separates instruction selection from register allocation
Paper exercises • Each chapter has pencil-and-paper exercises • marked with a star are more challenging • two-star problems are difficult but solvable • occasional three-star exercises are not known to have a solution.
One or two semester course • One-semester course could cover all of Part I • Chapters 1-12 • students implement the project compiler • working in groups • in addition, selected topics from Part II. • An advanced or graduate course – cover Part II • as well as additional topics from the other literature • many of the Part II chapters can stand independently • In a two-quarter sequence • the first quarter could cover Chapters 1-8 • and the second quarter could cover Chapters 9-12 • and some chapters from Part II
Course material • Chapters in Part I • are acompanied by skeleton code • easily built to a full working compiler module • can be used on practical exercises • Chapters in Part II • give only theorethical knoledge • and general instructions how to add discused feature to the Tiger compiler • there is no skeleton code
Target language • RISC • 32 registers • only one class of integer/pointer registers • arithmetic operations only betwean registers • tree-address instructions of form r1r23 • load and store only with M[reg+const] addressing • every instruction is 32bit long • one result effect per instruction • CISC • few registers (16, 8, or 6) • registers divided in to classes • some operations available only on certain registers • aritmetic operations on registers and memory • two-address instructions of form r1r23 • various addressing modes • variable length instructions • instructions with side effects
Tiger compiler Course sequence
Course sequence 1. Introduction 2. Lexical analysis • Phases • each phase is described in one section • some compilers combine parse, semantic analysis • others put instruction selection much later • simple compilers omit control and dataflow analysis 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
Introduction 1. Introduction 1. Introduction 2. Lexical analysis • Modules and interfaces • large software is much easies to understand • and to implement • Tools and software • context-free grammars • reguar expressions • Data structures • intermediate representations • tables, trees 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
2. Lexical analysis Lexical analysis 1. Introduction 2. Lexical analysis • Transforms program text • reads program text • outputs sequence of tokens • Algorithm • generated from lexical specification • JLex lexer generator 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
3. Parsing Parsing 1. Introduction 2. Lexical analysis • Checks program syntax • detects errors in order of tokens • Parsing algorithm • LALR(1) - parsing • CUP parser generator 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
4. Abstract syntax Abstract syntax 1. Introduction 2. Lexical analysis • Improves modularuty • syntax analysis is separated form semantic analysis • Semantic actions • during parsing • produce abstract parse tree 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
5. Semantic analysis Semantic analysis 1. Introduction 2. Lexical analysis • Checks program semantic • reports scope and type errors • Actions • builds symbol tables • performes scope analysis • checks types 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
6. Activation records Activation records 1. Introduction 2. Lexical analysis • Functions, local variables • several invocations of the same function may coexist • each invocation has its own instances of local variables • Stack frames • local variables, parameters • return address, temporaries • static and dynamic links 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
7. Intermediate code Intermediate code 1. Introduction 2. Lexical analysis • Allows portability • only N front ends and M back ends • Abstract machine language • can express target machine operations • indipendent of details of cource language • represented by simple expression trees 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
Intermediate code a + b * 4 if a = b then break else x:= 5
8. Blocks and traces Blocks and traces 1. Introduction 2. Lexical analysis • Basic blocks • begins with a label • ends with jump • no other labels or jumps • Traces • blocks can be arranged in any order • arrange that most jumps are followed by their label 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
9. Instruction selection Instruction selection 1. Introduction 2. Lexical analysis • Allows portability • finding apropriate machine instructions to implement IR • Tree patterns • one pattern represents one instruction • instruction selection is tiling of IR tree 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
Instruction selection LOAD r1 → M [fp + a] ADDI r2 → r0 + 4 MUL r2 → ri x r2 ADD r1 → r1 + r2 LOAD r2 → M [fp + x] STORE M [r1 + 0] → r2 LOAD r1 → M [fp + a] ADDI r2 → r0 + 4 MUL r2 → ri x r2 ADD r1 → r1 + r2 ADDI r2 → fp + x MOVE M [r1] → M [r2]
10. Liveness analysis Leveness analysis 1. Introduction 2. Lexical analysis • Detects needed values • determines which variable will be needed in the future • Problem • IR has unbounded number of temporaries • target machine has limited number of registers • Solution • Control and dataflow graph 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
11. Register allocation Register allocation 1. Introduction 2. Lexical analysis • Assignes registers • links temporaries with registers • Interference graph • is created from examination of control and dataflow graph • is colored for registers to be assigned 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
12. Putting all together Putting it all together 1. Introduction 2. Lexical analysis • Properties • nested functions • missing structured values • tree intermediate representations • register allocation • Remains • list all registers • procedure entry / exit • implement strings 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 11. Register allocation 5. Semantic analysis 8. Blocks and traces 12. Putting all together
17. Dataflow analysis 18. Loop optimizations 19. Static single assignment form 20. Pipelinining, scheduling 21. Memory hierarchies Optimizations 1. Introduction 2. Lexical analysis • Optimizing compiler • transforms programs to improve efficiency • uses dataflow analysis • Algorithms • Static single assignment form • use pipelining if available • utilize cache 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 17. Dataflow analysis 11. Register allocation 5. Semantic analysis 8. Blocks and traces 18. Loop optimizations 19. Static single assignment form 12. Putting all together 12. Putting all together 20. Pipelinining, scheduling 13. Garbage collection 21. Memory hierarchies
Garbage collection 1. Introduction 2. Lexical analysis • Algorithms • mark and sweep • reference counts • copying collection • generational collection • Incremental collection 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 17. Dataflow analysis 11. Register allocation 5. Semantic analysis 8. Blocks and traces 18. Loop optimizations 19. Static single assignment form 12. Putting all together 12. Putting all together 20. Pipelinining, scheduling 13. Garbage collection 13. Garbage collection 21. Memory hierarchies
16. Polymorphic types Language modifications 14. Object-oriented languages 14. Object-oriented languages 1. Introduction 15. Functional languages 15. Functional languages 2. Lexical analysis 16. Polymorphic types 6. Activation records 3. Parsing 9. Instruction selection 10. Liveness analysis 4. Abstract syntax 7. Intermediate code 17. Dataflow analysis 11. Register allocation 5. Semantic analysis 8. Blocks and traces 18. Loop optimizations 19. Static single assignment form 12. Putting all together 20. Pipelinining, scheduling 13. Garbage collection 21. Memory hierarchies
Tiger compiler Object-oriented Tiger language
Object-oriented principles • Information hiding • Useful software principle • module provide values of given type • only that module knows its representation • Extension • inheritance Object-Tiger Tiger can easily become object oriented
Program example let start := 10 class Vehicle extends Object { var position := start method move (int x)= ( position := position + x ) } class Truck extends Vehicle { method move (int x)= if x <= 80 then position := position + x } var t :=new Truck var v: Vehicle := t in t.move(50) v.move(100) end