150 likes | 283 Views
Top Down Parsing. Recursive Descent Parsing Top-down parsing: Build tree from root symbol Each production corresponds to one recursive procedure Each procedure recognizes an instance of a non-terminal, returns tree fragment for the non-terminal. General model.
E N D
Top Down Parsing • Recursive Descent Parsing • Top-down parsing: • Build tree from root symbol • Each production corresponds to one recursive procedure • Each procedure recognizes an instance of a non-terminal, returns tree fragment for the non-terminal Department of Software & Media Technology
General model • Each right-hand side of a production provides body for a function • Each non-terminal on the right hand side is translated into a call to the function that recognizes that non-terminal • Each terminal in the right hand side is translated into a call to the lexical scanner. If the resulting token is not the expected terminal error occurs. • Each recognizing function returns a tree fragment. Department of Software & Media Technology
Example: parsing a declaration • FULL_TYPE_DECLARATION ::= • type DEFINING_IDENTIFIER is TYPE_DEFINITION; • Translates into: • get token type • Find a defining_identifier -- function call • get token is • Recognize a type_definition -- function call • get token semicolon • In practice, we already know that the first token istype, that’s why this routine was called in the first place! Predictive parsing is guided by the next token Department of Software & Media Technology
Example: parsing a loop • FOR_STATEMENT ::= ITERATION_SCHEME loop STATEMENTS end loop; Node1 := find_iteration_scheme; -- call function get token loop List1 := Sequence of statements -- call function get token end get token loop get token semicolon; Result := build loop_node with Node1 and List1 return Result Department of Software & Media Technology
Problem: • If there are multiple productions for a non-terminal, mechanism is required to determine which production to use: IF_STAT ::= if COND then Stats end if; IF_STAT ::= if COND then Stats ELSIF_PART end if; When next token is if, so which production to use ? Department of Software & Media Technology
One Solution: factorize grammar • If several productions have the same prefix, rewrite as single production: • IF_STAT ::= if COND then STATS [ELSIF_PART] end if; • Problem now reduces to recognizing whether an optional • Component (ELSIF_PART) is present Department of Software & Media Technology
Second Problem of Recursion • Grammar should not be left-recursive: • E ::= E + T | T • Problem: to find an E, start by finding an E… • Original scheme leads to infinite loop • Grammar is inappropriate for recursive-descent Department of Software & Media Technology
Solution to left-recursion • E ::= E + T | T means that eventually E expands into T + T + T …. • Rewrite as: • E ::= TE’ • E’ ::= + TE’ | epsilon • Informally: E’ is a possibly empty sequence of terms separated by an operator Department of Software & Media Technology
Recursion can involve multiple productions • A ::= B C | D • B ::= A E | F • Can be rewritten as: A ::= A E C | F C | D • Now apply previous method • General algorithm to detect and remove left-recursion Department of Software & Media Technology
Further Problem • Transformation does not preserve associativity: • E ::= E + T | T • Parses a + b + c as (a + b) + c • E ::= TE’, E’ ::= + TE’ | epsilon • Parses a + b +c as a + (b + c) • Incorrect for a - b – c : must rewrite tree Department of Software & Media Technology
In practice: use loop to find sequence of terms Node1 := P_Term; -- call function that recognizes a term loop exit when Token not in Token_Class_Binary_Addop; Node2 := New_Node (P_Binary_Adding_Operator); Scan; -- past operator Set_Left_Opnd (Node2, Node1); Set_Right_Opnd (Node2, P_Term); -- find next term Set_Op_Name (Node2); Node1 := Node2; -- operand for next operation end loop; Department of Software & Media Technology
LL (1) Parsing LL (1) grammars • If table construction is successful, grammar is LL (1): left-to right, leftmost derivation with one-token lookahead. • If construction fails, can conceive of LL (2), etc. • Ambiguous grammars are never LL (k) • If a terminal is in First for two different productions of A, the grammar cannot be LL (1). • Grammars with left-recursion are never LL (k) • Some useful constructs are not LL (k) Department of Software & Media Technology
Building LL (1) parse tables Table indexed by non-terminal and token. Table entry is a production: for each production P: A aloop for each terminal ain First (a) loop T (A, a) := P; end loop; ifein First (a), then for each terminal b in Follow (a) loop T (A, b) := P; end loop; end if; end loop; • All other entries are errors. • If two assignments conflict, parse table cannot be built. Department of Software & Media Technology
Left Recursion Removal & Left Factoring Left Recursion Removal: Left Factoring: Department of Software & Media Technology
Synatx Tree Construction in LL(1) First and Follow Sets LL(k) Parsers (Extending the Lookahead Error Recovery in Top Down Parsers Error Recovery in LL(1) Parsers Department of Software & Media Technology