240 likes | 324 Views
LING 408/508: Computational Techniques for Linguists. Lecture 24 10/22/2012. Outline. Top-down parsing Short assignment #16. Parsing. Input: string to be parsed Output: a tree indicating the phrase structure of the string Need an algorithm for parsing Top-down
E N D
LING 408/508: Computational Techniques for Linguists Lecture 24 10/22/2012
Outline • Top-down parsing • Short assignment #16
Parsing • Input: string to be parsed • Output: a tree indicating the phrase structure of the string • Need an algorithm for parsing • Top-down • Bottom-up (won’t really discuss) • Tabular (will show you CKY)
Top-down parsing • Idea: the CFG defines how a sentence can be generated. Search for a derivation from the start symbol that matches the input. • Also called: • Recursive-descent parsing • Predictive parsing
Sketch of top-down parsing algorithm • Build parse tree that generates the sentence • parse tree = phrase structure tree • Initialize • Tree has start symbol as root node • Input is scanned starting at leftmost symbol • At each iteration, expand/match the leftmost node • If nonterminal, expand tree according to CFG rules • If terminal, match the node against current input symbol • If match, advance input to next symbol • If don’t match, reject parse, and backtrack
Leftmost derivation of phrase structure tree S • S NP VP • NP DT N • VP V • DT a • N flight • V left Input: a flight left Phrase marker 1.S begin with start symbol S 2.NP VP apply rule 1, replace S with NP VP 3.DT N VP apply rule 2, replace NP with DT 4.a N VP apply rule 4, replace DTwith a 5.a flight VPapply rule 5, replace N with flight 6.a flight Vapply rule 3, replace VP with V • a flight leftapply rule 6, replace V with left Stop: phrase marker contains only terminal symbols NP VP DT N V a flight left
S NP VP • S NP VP • NP DT N • VP V • DT a • N flight • V left Input: a flight left Steps in top-down parse: 1. Build node for start symbol, S. 2. Leftmost unexpanded/unmatched symbol is S. By rule 1, expand S to NP VP. 3. Leftmost unexpanded/unmatched symbol is NP. By rule 2, expand NP to DT N. 4. Leftmost unexpanded/unmatched symbol is DT. By rule 4, expand DT to a. 5. Leftmost unexpanded/unmatched symbol is a. Match with input symbol a. Advance input pointer. 6. Leftmost unexpanded/unmatched symbol is N. By rule 5, expand N to flight. 7. Leftmost unexpanded/unmatched symbol is flight, a terminal. Match with input symbol flight. Advance input pointer. 8. Leftmost unexpanded/unmatched symbol is VP. By rule 3, expand VP to V. 9. Leftmost unexpanded/unmatched symbol is V. By rule 6, expand V to left. 10. Leftmost unexpanded/unmatched symbol is left, a terminal. Match with input symbol left. Advance input pointer. The tree is fully expanded into terminals and matched, and no input symbols remain. Therefore, the parse is complete. DT N V a flight left
Backtracking • Backtracking occurs when the prediction made by the parser is incorrect. • Reach a step in parsing when terminal symbol in parse tree does not match current input symbol • Backtrack: go to a previous parser state and try the next option for the parse tree. • Options result from multiple rules for a nonterminal • Maintain a stack of parser states • Failure to parse sentence if no parser state matches the input
Backtracking example 1 • S NP VP • NP DT N • VP V • DT a • N flight • V left | arrived • Sentence that will cause backtracking: A flight arrived
When V is expanded, there are two choices; create two parser states • V left | arrived • Most recent tree fails to match input. Backtrack to next most recent parser state, which succeeds. S S NP VP NP VP DT N V DT N V a flight left a flight arrived
Backtracking example 2 S NP VP • S NP VP • NP DT N | N • VP V | V NP • DT a • N flight | fuel • V left | burns • Sentence that will cause backtracking: A flight burns fuel DT N V NP a flight burns N fuel
Two choices for expanding VP. Most-recent state has VP V. S S • (Leaving out some steps:) • Will reach a step where tree is expanded to V burns. • Tree is complete, but there is still input remaining. • Failure to parse: backtrack to a previous parser state. NP VP NP VP DT N V NP DT N V a flight a flight burns
Implemention of backtracking:stack of parser states • A parser state consists of: • A parse tree • The leftmost node to be expanded or matched • The position in the input string • A stack stores alternative parser states
Initial parser state • A parser state consists of: • A parse tree • The leftmost node to be expanded or matched • The position in the input string • Initial state: (start symbol node, start symbol node, 0) • Visualization: ( , , 0 ) S S
Example of a later parser state S • Suppose input sentence is: a flight burns fuel • Parser state: • Tree • Terminal node ‘flight’ • Input position: 1 • Parser will then match prediction of ‘flight’ with input symbol, which is ‘flight' NP VP DT N a flight
Next parser state S • Suppose input sentence is: a flight burns fuel • Parser state: • Tree • Nonterminal node V • Input position: 2 • Second element of parser state is leftmost node that has not yet been expanded or matched NP VP DT N a flight
Top-down parsing algorithm, describing manipulation of the stack • Initialize stack with (start node, start node, 0) • Repeat forever: • Remove topmost parser state from stack (pop). If there is none, the input sentence cannot be parsed. • If leftmost unexpanded/unmatched node is a nonterminal, create new parser states: look up rules in CFG, create corresponding trees, and put them on top of the stack. • If leftmost unexpanded/unmatched node is a terminal: • If it matches input symbol, create a new parser state: same tree, choose next unexpanded/unmatched node, advance input position. • If it doesn’t match input, reject this parser state. Next iteration will backtrack to parser state on top of stack.
Parsing a nonterminal: create parser state(s) with expanded tree • CFG fragment: S NP VP | VP • Put start state on stack. • Parse: • Pop parser state. • Node is S, which is a nonterminal. Create new parser states. (go right-to-left through rule alternatives when putting on stack, so that top state corresponds to first rule in CFG) Input: 0 S NP VP Input: 0 S S Input: 0 VP
Parsing a terminal 1: match input, create state with new node and input position • Input: a flight burns • At an intermediate state of parsing. • Pop parser state. • Node is ‘a’, which is terminal. Try to match against input. • Successful match: create new parser state, with leftmost expanded/unmatched node re-computed, and input position increased. Put on stack. • Next iteration, top state refers to next position in input. S NP VP DT N a Input: 1 Input: 0
Parsing a terminal 2: match input, create state with new node and input position • Input: a flight burns • At an intermediate state of parsing. • Pop parser state. • Node is ‘flight’, a terminal. Try to match against input. • Successful match: create new parser state, with leftmost expanded/unmatched node re-computed, and input position increased. Put on stack. • Next iteration, top state refers to next position in input. S NP VP DT N Input: 1 Input: 2 a flight
Parsing a terminal 3: failure to match input S • Input: a house burns • At an intermediate state of parsing. • Pop parser state. • Node is ‘flight’, which is terminal. Try to match against input. • Failure to match: input symbol is ‘house’. • Next iteration, use previous parser state, which is now on top of the stack. NP VP DT N a flight Input: 1 Previous parser state
Failure to parse • If the last parser state fails to parse, the input sentence cannot be parsed • “Fails to parse” • Input symbol does not match prediction • Input symbol does match prediction and tree is complete, but there are additional input symbols remaining Last parser state
Outline • Top-down parsing • Short assignment #16
Due 10/24 • Draw the stack sequence for a top-down parse of fuel burns using the below CFG. At each iteration, indicate whether the parser: • expands a nonterminal and creates new parser states • fails to match an input terminal and backtracks • matches an input terminal and constructs a new parser state, or • matches an input terminal and succeeds in the parse. • S NP VP • NP DT N | N • VP V • DT a • N flight | fuel • V left | burns