Compiler construction in4020 – lecture 3

Compiler constructionin4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands

program text token description scanner generator lexical analysis tokens syntax analysis AST context handling annotated AST Summary oflecture 2 • lexical analyzer generator • description  FSA • FSA construction • dotted items • character moves •  moves

Quiz 2.24 Will the function identify() (to access a symbol table)still work if the hash function maps all identifiers onto the same number? 2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.

program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Overview • syntax analysis: tokens  AST • AST construction • by hand: recursive descent • automatic: top-down (LLgen), bottom-up (yacc)

syn-tax: the way in which words are put together to form phrases, clauses, or sentences. Webster’s Dictionary Syntax analysis • parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. • result: parse tree expression ‘-’ expression term ‘*’ term term factor ‘*’ factor factor term term identifier ‘*’ ‘c’ identifier identifier factor factor identifier constant ‘b’ ‘a’ ‘b’ ‘4’

Syntax analysis • parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. • result: parse tree expression ‘-’ expression term ‘*’ term term factor ‘*’ factor factor term term identifier ‘*’ ‘c’ identifier identifier factor factor identifier constant ‘b’ ‘a’ ‘b’ ‘4’

Context free grammar • G = (VN, VT, S, P) • VN : set of Non-terminal symbols • VT : set of Terminal symbols • S : Start symbol (S  VN) • P : set of Production rules {N  } • VN VT =  • P = {N   | N  VN    (VN VT)*}

expression EOF term rest_expression IDENTIFIER Top-down parsing • process tokens left to right • expression grammar input expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression |  • example expression input aap + ( noot + mies )

rest_expression expression term rest_expr IDENT IDENT IDENT  • • Bottom-up parsing • process tokens left to right • expression grammar input expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression |  aap + ( noot + mies ) • • • • •

1 5 2 3 1 4 4 5 2 3 Comparison

Recursive descent parsing • each rule N translates to a boolean function • return true if a terminal production of N was matched • return false otherwise (without consuming any token) • try alternatives of N in turn • a terminal symbol must match the current token • a non-terminal is matched by calling its routine input expression EOF int input(void) { return expression() && require(token(EOF)); }

Recursive descent parsing expression  term rest_expression int expression(void) { return term() && require(rest_expression()); } term  IDENTIFIER | ‘(’ expression ‘)’ int term(void) { return token(IDENTIFIER) || token('(') && require(expression()) && require(token(')')); }

Recursive descent parsing rest_expression  ‘+’ expression |  auxiliary functions • consume matched tokens • report syntax errors int rest_expression(void) { return token('+') && require(expression()) || 1; } int token(int tk) { if (Token.class != tk) return 0; get_next_token(); return 1; } int require(int found) { if (!found) error(); return 1; }

Automatic top-down parsing • follow recursive descent scheme, but avoid interpretation overhead • for each rule and alternative determine the tokens it can start with: FIRST set • parsing scheme for rule N  A1 | A2 | … • if token  FIRST(N) thenERROR • if token  FIRST(A1) then parse A1 • if token  FIRST(A2) then parse A2 • …

Exercise (7 min.) • design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar. • hint: consider the types of rules • alternatives • composition • empty productions input expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression | 

Answers

Answers (Fig 2.58, page 122) • N w (w  VT ) FIRST(N) ={w} • N  A1 | A2 | … FIRST(N) = FIRST(Ai ) • N  FIRST(N) = {} • N  A  FIRST(N) =FIRST(A) ,   FIRST(A) FIRST(N) =FIRST(A) \ {}FIRST() , otherwise closure algorithm

Break

Predictive parsing • similar to recursive descent, but no back-tracking • functions “know” what they are doing input  expression EOF FIRST(expression) = {IDENT, ‘(‘} void input(void) {switch (Token.class) {case IDENT: case '(':expression(); token(EOF);break;default: error(); } } void token(int tk) { if (Token.class != tk) error(); get_next_token(); }

Predictive parsing expression  term rest_expression FIRST(term) = {IDENT, ‘(‘} void expression(void) {switch (Token.class) {case IDENT: case '(':term(); rest_expression(); break;default: error(); } } term  IDENTIFIER | ‘(’ expression ‘)’ void term(void) {switch (Token.class) {case IDENT: token(IDENT); break;case '(': token('('); expression(); token(')');break;default: error(); } }

Predictive parsing rest_expression  ‘+’ expression | FIRST(rest_expr) = {‘+’, } • FIRST() = {} • check nothing? • NO: token  FOLLOW(rest_expr) void rest_expression(void) {switch (Token.class) {case '+': token('+'); expression(); break;case EOF: case ')': break;default: error(); } } FOLLOW(rest_expr) = {EOF, ‘)’}

Limitations of LL(1) parsers • FIRST/FIRST conflict term  IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’ • FIRST/FOLLOW conflict S  A ‘a’ ‘b’ A  ‘a’ |  • left recursion expression  expression ‘-’ term | term FIRST(A) = {‘a’} = FOLLOW(A)

Making grammars LL(1) • manual labour • rewrite grammar • adjust semantic actions • three rewrite methods • left factoring • substitution • left-recursion removal

Left factoring term  IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ • factor out common prefix term  IDENTIFIERafter_identifier after_identifier | ‘[‘ expression ‘]’ ‘[’ FOLLOW(after_identifier)

Substitution S  A ‘a’ ‘b’ A  ‘a’ |  • replace non-terminal by its alternative S  ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’

N   Left-recursion removal N  N  |  • replace by N   M M   M |  • example expression  expression ‘-’ term | term           ... expression  term tail tail ‘-’ term tail | 

Exercise (7 min.) • make the following grammar LL(1) expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor ‘(‘ expression ‘)’ | func-call | identifier | constant func-call  identifier ‘(‘ expr-list? ‘)’ expr-list  expression (‘,’ expression)* • and what about S  if E then S (else S)?

Answers

Answers • substitution F ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant • left factoring E E( ‘+’ |‘-’ )T | T T  T ( ‘*’| ‘/’ ) F | F F ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant • left recursion removal E T(( ‘+’ |‘-’ )T )* T  F(( ‘*’| ‘/’ ) F )* • if-then-else grammar is ambiguous

program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling LL(1) push-down automaton annotated AST automatic generation

LL(1) push-down automaton • stack right-hand side of production

LL(1) push-down automaton input prediction stack aap + ( noot + mies ) EOF input

LL(1) push-down automaton input prediction stack aap + ( noot + mies ) EOF input replace non-terminal by transition entry

LL(1) push-down automaton expression EOF prediction stack aap + ( noot + mies ) EOF input

LL(1) push-down automaton expression EOF prediction stack aap + ( noot + mies ) EOF input replace non-terminal by transition entry

LL(1) push-down automaton term rest-expr EOF prediction stack aap + ( noot + mies ) EOF input

LL(1) push-down automaton term rest-expr EOF prediction stack aap + ( noot + mies ) EOF input replace non-terminal by transition entry

LL(1) push-down automaton IDENT rest-expr EOF prediction stack aap + ( noot + mies ) EOF input

LL(1) push-down automaton IDENT rest-expr EOF prediction stack pop matching token aap + ( noot + mies ) EOF input

LL(1) push-down automaton rest-expr EOF prediction stack + ( noot + mies ) EOF input

LL(1) push-down automaton + expression EOF prediction stack + ( noot + mies ) EOF input

LL(1) push-down automaton expression EOF prediction stack ( noot + mies ) EOF input

LLgen • top-down parser generator • to be used in assignment #1 • discussed in lecture 5

program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Summary • syntax analysis: tokens  AST • top-down parsing • recursive descent • push-down automaton • making grammars LL(1)

Homework • study sections: • 1.10 closure algorithm • 2.2.4.6 error handling in LL(1) parsers • print handout for next week [blackboard] • find a partner for the “practicum” • register your group • send e-mail to koen@pds.twi.tudelft.nl

Compiler construction in4020 – lecture 3

Compiler construction in4020 – lecture 3

Presentation Transcript

ENERGY 211 / CME 211

Modern Automotive Technology

Balun/Unun Construction

NEW CONSTRUCTION TECHNOLOGY

Construction Dewatering

Lecture 3 Instruction-Level Parallelism and Its Exploitation (Chapter 2 in textbook)

Compiler Concepts for Database Systems

Compiler techniques for exposing ILP

Supercomputing in Plain English Stupid Compiler Tricks

CONSTRUCTION BLUEPRINT READING

Advanced Compiler Techniques

Supercomputing in Plain English Stupid Compiler Tricks

Chap 5

Compiler Construction 2010, Lecture 7

Thursday Oct. 18 Concrete Lecture Concrete Team Powerpoint Presentations Assigned

Joints in Concrete Construction

Social Construction of Reality: News Media and the Making of Crime Images

Welcome!

Materials for Lecture 08

CS 3016 Compilers