320 likes | 486 Views
Lesson 4. CDT301 – Compiler Theory , Spring 2011 Teacher : Linus Källberg. Outline. Recursive descent parsers Left recursion Left factoring. Recursive descent parsing. Writing a recursive descent parser. Straightforward once the grammar is written in an appropriate form:
E N D
Lesson 4 CDT301 – CompilerTheory, Spring 2011 Teacher: Linus Källberg
Outline • Recursive descent parsers • Left recursion • Left factoring
Writing a recursivedescent parser • Straightforward once the grammar is written in an appropriate form: • For each nonterminal: create a function • Represents the expectation of that nonterminal in the input • Each such function should choose a grammar production, i.e., RHS, based on the lookahead token • It should then process the chosen RHS • Terminals are “matched”: match(IF);match(LEFT_PARENTHESIS); … match(RIGHT_PARENTHESIS); … • For nonterminals their corresponding “expectation functions” are called
The function match() • Helper function to consume terminals: void match(intexpected_lookahead) { if (lookahead == expected_lookahead) lookahead = nextToken(); else error(); } (assumes tokens are represented as ints)
Recursive descent example • Grammar for a subset of the language “types in Pascal”: type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num • Examples of “programs”: ^ my_type array[ 1..10 ] of Integer array[ Char ] of 72..98
Recursive descent example void type() { switch(lookahead) { case '^': match('^'); match(ID); break; case ARRAY: match(ARRAY); match('['); simple(); match(']'); match(OF); type(); break; default: simple(); } } void simple() { switch(lookahead) { caseINTEGER: match(INTEGER); break; case CHAR: match(CHAR); break; caseNUM: match(NUM); match(DOTDOT); match(NUM); break; default: error(); } }
Exercise (1) List the calls made by the previous recursive descent parser on the input string array [ num dotdot num ] of integer To get you started: type() match(ARRAY) match('[') simple() ...
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type array [numdotdotnum] of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum] of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type simple array [numdotdotnum]of integer
type → ^ id | array [ simple ] of type | simple simple → integer | char | num dotdot num type simple type simple array [numdotdotnum]of integer
The problem with left recursion • Left-recursive grammar: A → A α | β • Problematic for recursive descent parsing • Infinite recursion
The problem with left recursion • The left-recursive expression grammar: expr → expr + num | expr – num | num • Parser code: voidexpr() { if (lookahead != NUM) expr(); match('+'); …
Eliminating left recursion • Left-recursive grammar: A → A α | β • Rewritten grammar: A → β M M → α M | ε
Exercise (2) Remove the left recursion from the following grammar for formal parameter lists in C: list → par | list , par par → int id int and id are tokens that represent the keyword int and identifiers, respectively. Hint: what is α and what is β in this case?
The problem • Recall: how does a predictive parser choose production body? • What if the lookahead token matches more than one such production body?
The problem • Problematic grammar: list → num | num , list • If lookahead = num, what to expect?
Left factoring • The previous grammar, list → num | num, list becomes list → numlist’ list’ → ε | , list
Exercise (3) Perform left factoring on the following grammar for declarations of variables and functions in C: decl → intid ; | intid ( pars ) ; pars → ...
Conclusion • Recursive descent parsers • Left recursion • Left factoring
Next time • The sets firstand follow • Defining LL(1) grammars • Non-recursive top-down parser • Handling syntax errors