Compiler construction in4020 – lecture 4

Compiler constructionin4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands

program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Summary of lecture 3 • syntax analysis: tokens  AST • top-down parsing • recursive descent • push-down automaton • making grammars LL(1)

Quiz 2.31Can you create a top-down parser for the following grammars? (a) S ‘(‘ S ‘)’ | ‘)’ (b) S ‘(‘ S ‘)’ |  (c) S ‘(‘ S ‘)’ | ‘)’ | 

program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Overview • syntax analysis: tokens  AST • bottom-up parsing • push-down automaton • ACTION/GOTO tables • LR(0), SLR(1), LR(1), LALR(1)

rest_expression expression term rest_expr IDENT IDENT IDENT  aap + ( noot + mies ) Bottom-up (LR) parsing • Left-to-right parse, Rightmost-derivation • create a node when all children are present • handle: nodes representing the right-hand side of a production

LR(0) parsing • running example: expression grammar input expression EOF expression  expression ‘+’ term | term term  IDENTIFIER | ‘(’ expression ‘)’ • short-hand notation Z  E $ E  E ‘+’ T | T T  i | ‘(’ E ‘)

LR(0) parsing • running example: expression grammar input expression EOF expression  expression ‘+’ term | term term  IDENTIFIER | ‘(’ expression ‘)’ • short-hand notation Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’

Z   E $ LR(0) parsing • keep track of progress inside potential handles when consuming input tokens • LR items: N   • initial set • -closure: expand dots in front of non-terminals Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ S0

input i + i $ stack S0 LR(0) parsing Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ • shift input token (i) onto the stack • compute new state

LR(0) parsing Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 i S1 + i $ • reduce handle on top of the stack • compute new state

LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 T S2 + i $ i

LR(0) parsing • shift input token on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + i $ T i

LR(0) parsing • shift input token on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i $ T i

LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i S1 $ T i

LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 T S5 $ T i i

LR(0) parsing • shift input token on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ E + T T i i

LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ S6 E + T T i i

LR(0) parsing • accept! Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 Z E $ E + T T i i

S0 Z   E $ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ S3 Z  E  $ E  E  ‘+’ T S5 E  E + T  Transition diagram S2 T E  T  i S1 T  i  E i S4 E  E ‘+’  T T   i T   ‘(’ E ‘)’ ‘+’ $ T S6 Z  E $ 

Exercise (8 min.) • complete the transition diagram for the LR(0) automaton • can you think of a single input expression that causes all states to be used? If yes, give an example. If no, explain.

Answers

S0 Z   E $ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ S3 Z  E  $ E  E  ‘+’ T S5 E  E + T  Answers (fig 2.89) S2 S7 T  ‘(’  E ‘)’ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ T T E  T  ‘(’ i i S1 T  i  E ‘(’ E ‘(’ i S4 S8 E  E ‘+’  T T   i T   ‘(’ E ‘)’ ‘+’ ‘+’ T  ‘(‘ E  ‘)’ E  E  ‘+’ T $ ‘)’ T S6 S9 Z  E $  T  ‘(‘ E ‘)’ 

Answers The following expression exercises all states ( i ) + i

The LR tables

LR(0) parsingconcise notation

The LR push-down automaton SWITCH action_table[top of stack]: CASE “shift”: see book; CASE (“reduce”, N ): POP the symbols of  FROM the stack; SET state TO top of stack; PUSHNON the stack; SET new state TO goto_table[state,N]; PUSH new state ON the stack; CASE empty: ERROR;

Break

LR(0) conflicts • shift-reduce conflict • array indexing: T  i [ E ] T  i [ E ](shift) T  i(reduce) • -rule: RestExpr  Expr  Term  RestExpr (shift) RestExpr  (reduce)

LR(0) conflicts • reduce-reduce conflict • assignment statement: Z  V := E $ V  i (reduce) T  i (reduce) • typical LR(0) table contains many conflicts

Handling LR(0) conflicts • solution: use a one-token look-ahead two-dimensional ACTION table [state,token] • different construction of ACTION table • SLR(1) – Simple LR • LR(1) • LALR(1) – Look-Ahead LR

SLR(1) parsing • solves (some) shift-reduce conflicts • reduce N  iff token FOLLOW(N) FOLLOW(T) = { ‘+’, ‘)’, $ } FOLLOW(E) = { ‘+’, ‘)’, $ } FOLLOW(Z) = { $ }

SLR(1) ACTION table

SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  E ‘+’ T 3: E  T 4: T  i 5: T  ‘(’ E ‘)’ sn – shift to state n rn – reduce rule n

SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  E ‘+’ T 3: E  T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’ sn – shift to state n rn – reduce rule n

Unfortunately ... • SLR(1) leaves many shift-reduce conflicts unsolved • problem: FOLLOW(N) set is a union of all contexts in which N may occur • example S  A | x b A  a A b | x

LR(0) automaton S3 S5 S  A  A  x  A x S0 S   A S   x b A   a A b A   x S4 A  a  A b A   a A b A   x a a x A S1 S6 S  x  b A  x  1: S  A 2: S  x b 3: A  a A b 4: A  x A  a A  b b b S2 S7 S  x b  A  a A b 

Exercise (6 min.) • derive the SLR(1) ACTION/GOTO table (with shift-reduce conflict) for the following grammar: S  A | x b A  a A b | x

Answers

Answers 1: S  A 2: S  x b 3: A  a A b 4: A  x FOLLOW(S) = {$} FOLLOW(A) = {$,b}

LR(1) parsing • maintain follow set per item LR(1) item: N   {} •  - closure for LR(1) item sets: if set S contains an item P  N  {} then foreach production rule N   S must contain the item N   {} where  = FIRST(  {} )

LR(1) automaton S3 S5 S  A  {$} A  x  {b} A x x S0 S   A {$} S   x b {$} A   a A b {$} A   x {$} S4 S8 A  a  A b {$} A   a A b {b} A   x {b} A  a  A b {b} A   a A b {b} A   x {b} a a x A A A S1 S6 S9 S  x  b {$} A  x  {$} A  a A  b {$} A  a A  b {b} b b b S2 S7 S10 S  x b  {$} A  a A b  {$} A  a A b  {b}

LALR(1) parsing • LR tables are big • combine “equal” sets by merging look-ahead sets

LALR(1) automaton S3 S5 S  A  {$} A  x  {b} A x x S0 S   A {$} S   x b {$} A   a A b {$} A   x {$} S4 S8 A  a  A b {$} A   a A b {b} A   x {b} A  a  A b {b} A   a A b {b} A   x {b} a a x A A A S1 S6 S9 S  x  b {$} A  x  {$} A  a A  b {$} A  a A  b {b} b b b S2 S7 S10 S  x b  {$} A  a A b  {$} A  a A b  {b}

LALR(1) automaton S3 S5 S  A  {$} A  x  {b} A x S0 S   A {$} S   x b {$} A   a A b {$} A   x {$} S   A {$} S   x b {$} A   a A b {b,$} A   x {b,$} S4 A  a  A b {b,$} A   a A b {b,$} A   x {b,$} A  a  A b {b,$} A   a A b {b,$} A   x {b,$} a a x A A S1 S6 S  x  b {$} A  x  {$} S  x  b {$} A  x  {b,$} A  a A  b {b,$} A  a A  b {b,$} b b S2 S7 A  a A b  {b,$} S  x b  {$} A  a A b  {b,$}

LALR(1) ACTION/GOTO table 1: S  A 2: S  x b 3: A  a A b 4: A  x

+ * * E E + E E E E Making grammars LR(1) – or not? • grammars are often ambiguous E  E ‘+’ E | E ‘*’ E | i • handle shift-reduce conflicts • (default) longest match: shift • precedence directives input: i * i + i E  E ‘+’ E E  E ‘*’ E  reduce shift

Summary • syntax analysis: tokens  AST • bottom-up parsing • push-down automaton • ACTION/GOTO tables • LR(0) NO look-ahead • SLR(1) one-token look-ahead, FOLLOW sets to solve shift-reduce conflicts • LR(1) SLR(1), but FOLLOW set per item • LALR(1) LR(1), but “equal” states are merged

Homework • study sections: • 1.10 closure algorithm • 2.2.5.8 error handling in LR(1) parsers • print handout for next week [blackboard] • find a partner for the “practicum” • register your group • send e-mail to koen@pds.twi.tudelft.nl

Compiler construction in4020 – lecture 4

Compiler construction in4020 – lecture 4

Presentation Transcript

PE Review Course Construction Engineering

Road Construction Work Zone Safety

Lecture 9

Fare Construction

Lecture # 2

Scaffold Safety for Construction

Lecture 8: Intermediate Code

Compiler Construction of Idempotent Regions and Applications in Architecture Design

LECTURE ON INDUCTION MACHINE

Lecture Chapter 14

Compilation 0368-3133 (Semester A, 2013/14)

Compiler++ Evolving the compiler - C2.DLL

Compilation 0368-3133 (Semester A, 2013/14)

COMPILER CONSTRUCTION

COMPILER CONSTRUCTION

Software Outline

COMPILER CONSTRUCTION

Chapter 11: Recursion

Performance Analysis and Compiler Optimizations

Compiler Principle and Technology

Fall 2014-2015 Compiler Principles Lecture 1: Lexical Analysis