300 likes | 430 Views
Top-Down Parsing. Identify a leftmost derivation for an input string Why ? By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input.
E N D
Top-Down Parsing • Identify a leftmost derivation for an input string • Why ? • By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input. • A aBc adDc adec (scan a, scan d, scan e, scan c - accept!) • Recursive-descent parsing concepts • Predictive parsing • Recursive / Brute force technique • non-recursive / table driven • Error recovery • Implementation
Top-Down Parsing • From Grammar to Parser, take I
Recursive Descent Parsing S S cad cad c d A c d A a b Problem: backtrack S S cad cad c d A c d A a b a • General category of Parsing Top-Down • Choose production rule based on input symbol • May require backtracking to correct a wrong choice. • Example: S c A d • A ab | a input: cad S cad c d A a
Top-Down Parsing • From Grammar to Parser, take II
Predictive Parsing • Backtracking is bad! • To eliminate backtracking, what must we do/be sure of for grammar? • no left recursion • apply left factoring • (frequently) when grammar satisfies above conditions:current input symbol in conjunction with current non-terminal uniquely determines the production that needs to be applied. • Utilize transition diagrams: • For each non-terminal of the grammar do following: • 1. Create an initial and final state • 2. If A X1X2…Xn is a production, add path with edges X1, X2, … , Xn • Once transition diagrams have been developed, apply a straightforward technique to algorithmicize transition diagrams with procedure and possible recursion.
Transition Diagrams F ( E ) | id E TE’ E’ + TE’ | T FT’ T’ * FT’ | F T T’ E’ T: E: 7 0 8 1 9 2 + T E’ E’: 3 5 4 6 ( * F E ) T’ F: T’: 10 14 11 15 12 16 13 17 id • Unlike lexical equivalents, each edge represents a token • Transition implies: if token, match input else call proc • Recall earlier grammar and its associated transition diagrams How are transition diagrams used ? Are -moves a problem ? Can we simplify transition diagrams ? Why is simplification critical ?
How are Transition Diagrams Used ? main() { TD_E(); } TD_E’() { token = get_token(); if token = ‘+’ then { TD_T(); TD_E’(); } } What happened to -moves? … “else unget()and terminate” NOTE: not all error conditions have been represented. TD_F() { token = get_token(); if token = ‘(’ then { TD_E(); match(‘)’); } else if token.value <> id then {error + EXIT} else ... } TD_E() { TD_T(); TD_E’(); } TD_T() { TD_F(); TD_T’(); } TD_E’() { token = get_token(); if token = ‘*’ then { TD_F(); TD_T’(); } }
How can Transition Diagrams be Simplified ? + E’ E’: 3 5 T 4 6
How can Transition Diagrams be Simplified ? (2) + E’ E’: 3 5 + E’: 3 5 T T 4 4 6 6
How can Transition Diagrams be Simplified ? (3) + E’ E’: 3 5 T + + E’: 3 5 E’: 3 4 T T 4 4 6 6 6
How can Transition Diagrams be Simplified ? (4) + E’ E’: 3 5 T + + E’: 3 5 E’: 3 4 T E’ E: 0 1 2 T T 4 4 6 6 6
How can Transition Diagrams be Simplified ? (5) + E’ E’: 3 5 T + + E’: 3 5 E’: 3 4 T T E’ E: E: 0 0 1 2 T T 4 4 6 6 6 6 T + 3 4
Additional Transition Diagram Simplifications * 10 13 F F T: 7 * T’: 10 11 13 ( E ) F: 14 15 16 17 id • Similar steps for T and T’ • Simplified Transition diagrams: Why is simplification important ? How does code change?
Top-Down Parsing • From Grammar to Parser, take III
Motivating Table-Driven Parsing 1. Left to right scan input 2. Find leftmost derivation Terminator Grammar: E TE’ E’ +TE’ | T id Input : id + id $ Derivation: E Processing Stack:
Non-Recursive / Table Driven Input (String + terminator) Predictive Parsing Program Stack a + b $ Output NT + T symbols of CFG What actions parser should take based on stack / input Parsing Table M[A,a] X Y Z $ Empty stack symbol • General parser behavior: X : top of stack a : current input • 1. When X=a = $ halt, accept, success • 2. When X=a $ , POP X off stack, advance input, go to 1. • 3. When X is a non-terminal, examine M[X,a] • if it is an error call recovery routine • if M[X,a] = {X UVW}, POP X, PUSH W,V,U • DO NOT expend any input
Algorithm for Non-Recursive Parsing Set ip to point to the first symbol of w$; repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is terminal or $ then if X=a then pop X from the stack and advance ip else error() else /* X is a non-terminal */ if M[X,a] = XY1Y2…Ykthen begin pop X from stack; push Yk, Yk-1, … , Y1 onto stack, with Y1 on top output the production XY1Y2…Yk end else error() until X=$ /* stack is empty */ Input pointer May also execute other code based on the production used
Example E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id INPUT SYMBOL Non-terminal id + * ( ) $ E ETE’ ETE’ E’ E’+TE’ E’ E’ T TFT’ TFT’ T’ T’ T’*FT’ T’ T’ F Fid F(E) Our well-worn example ! Table M
Trace of Example STACK INPUT OUTPUT
Trace of Example STACK INPUT OUTPUT $E $E’T $E’T’F $E’T’id $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’id $E’T’ $E’ $ id + id * id$ id + id * id$ id + id * id$ id + id * id$ + id * id$ + id * id$ + id * id$ id * id$ id * id$ id * id$ * id$ * id$ id$ id$ $ $ $ E TE’ T FT’ F id T’ E’ +TE’ T FT’ F id T’ *FT’ F id T’ E’ Expend Input
Leftmost Derivation for the Example The leftmost derivation for the example is as follows: E TE’ FT’E’ id T’E’ id E’ id + TE’ id + FT’E’ id + id T’E’ id + id * FT’E’ id + id * id T’E’ id + id * id E’ id + id * id
What’s the Missing Puzzle Piece ? Constructing the Parsing Table M ! 1st : Calculate First & Follow for Grammar 2nd: Apply Construction Algorithm for Parsing Table ( We’ll see this shortly ) Basic Tools: First:Let be a string of grammar symbols. First() is the set that includes every terminal that appears leftmost in or in any string originating from . NOTE: If , then is First( ). Follow: Let A be a non-terminal. Follow(A) is the set of terminals a that can appear directly to the right of A in some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A). * * *
Motivation Behind First & Follow Is used to help find the appropriate reduction to follow given the top-of-the-stack non-terminal and the current input symbol. First: Example: If A , and a is in First(), then when a=input, replace A with (in the stack). ( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH . Follow: Is used when First has a conflict, to resolve choices, or when First gives no suggestion. When or , then what follows A dictates the next choice to be made. * Example: If A , and b is in Follow(A ), then when and b is an input character, then we expand A with , which will eventually expand to , of which b follows! ( : i.e., First( ) contains .) * *
An example. STACK INPUT OUTPUT $S abbd$ S aB C d B CB | |S a C b
Computing First(X) : All Grammar Symbols • 1. If X is a terminal, First(X) = {X} • 2. If X is a production rule, add to First(X) • 3. If X is a non-terminal, and X Y1Y2…Yk is a production rule • Place First(Y1) in First(X) • if Y1 , Place First(Y2) in First(X) • if Y2 , Place First(Y3) in First(X) • … • if Yk-1 , Place First(Yk) in First(X) • NOTE: As soon as Yi , Stop. • Repeat above steps until no more elements are added to any First( ) set. • Checking “Yj ?”essentially amounts to checking whether belongs to First(Yj) * * * * *
Computing First(X) : All Grammar Symbols - continued • Informally, suppose we want to compute • First(X1 X2 … Xn ) = First (X1) “+” • First(X2) if is in First(X1) “+” • First(X3) if is in First(X2) “+” • … • First(Xn) if is in First(Xn-1) Note 1: Only add to First(X1 X2 … Xn) if is in First(Xi) for all i Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !
Example 1 Given the production rules: S i E t SS’ | a S’ eS | E b
Example 1 Given the production rules: S i E t SS’ | a S’ eS | E b Verify that First(S) = { i, a } First(S’) = { e, } First(E) = { b }
Example 2 E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id Computing First for:
Example 2 E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id Overall: First(E) = { ( , id } = First(F) First(E’) = { + , } First(T’) = { * , } First(T) First(F) = { ( , id } Computing First for: First(TE’) First(T) “+” First(E’) First(E) * Not First(E’) since T First(T) First(F) “+” First(T’) First(F) * Not First(T’) since F First((E)) “+” First(id) “(“ and “id”