1 / 24

LING 408/508: Computational Techniques for Linguists

LING 408/508: Computational Techniques for Linguists. Lecture 24 10/22/2012. Outline. Top-down parsing Short assignment #16. Parsing. Input: string to be parsed Output: a tree indicating the phrase structure of the string Need an algorithm for parsing Top-down

makan
Download Presentation

LING 408/508: Computational Techniques for Linguists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 408/508: Computational Techniques for Linguists Lecture 24 10/22/2012

  2. Outline • Top-down parsing • Short assignment #16

  3. Parsing • Input: string to be parsed • Output: a tree indicating the phrase structure of the string • Need an algorithm for parsing • Top-down • Bottom-up (won’t really discuss) • Tabular (will show you CKY)

  4. Top-down parsing • Idea: the CFG defines how a sentence can be generated. Search for a derivation from the start symbol that matches the input. • Also called: • Recursive-descent parsing • Predictive parsing

  5. Sketch of top-down parsing algorithm • Build parse tree that generates the sentence • parse tree = phrase structure tree • Initialize • Tree has start symbol as root node • Input is scanned starting at leftmost symbol • At each iteration, expand/match the leftmost node • If nonterminal, expand tree according to CFG rules • If terminal, match the node against current input symbol • If match, advance input to next symbol • If don’t match, reject parse, and backtrack

  6. Leftmost derivation of phrase structure tree S • S  NP VP • NP DT N • VP  V • DT  a • N  flight • V left Input: a flight left Phrase marker 1.S begin with start symbol S 2.NP VP apply rule 1, replace S with NP VP 3.DT N VP apply rule 2, replace NP with DT 4.a N VP apply rule 4, replace DTwith a 5.a flight VPapply rule 5, replace N with flight 6.a flight Vapply rule 3, replace VP with V • a flight leftapply rule 6, replace V with left Stop: phrase marker contains only terminal symbols NP VP DT N V a flight left

  7. S NP VP • S  NP VP • NP DT N • VP  V • DT  a • N  flight • V left Input: a flight left Steps in top-down parse: 1. Build node for start symbol, S. 2. Leftmost unexpanded/unmatched symbol is S. By rule 1, expand S to NP VP. 3. Leftmost unexpanded/unmatched symbol is NP. By rule 2, expand NP to DT N. 4. Leftmost unexpanded/unmatched symbol is DT. By rule 4, expand DT to a. 5. Leftmost unexpanded/unmatched symbol is a. Match with input symbol a. Advance input pointer. 6. Leftmost unexpanded/unmatched symbol is N. By rule 5, expand N to flight. 7. Leftmost unexpanded/unmatched symbol is flight, a terminal. Match with input symbol flight. Advance input pointer. 8. Leftmost unexpanded/unmatched symbol is VP. By rule 3, expand VP to V. 9. Leftmost unexpanded/unmatched symbol is V. By rule 6, expand V to left. 10. Leftmost unexpanded/unmatched symbol is left, a terminal. Match with input symbol left. Advance input pointer. The tree is fully expanded into terminals and matched, and no input symbols remain. Therefore, the parse is complete. DT N V a flight left

  8. Backtracking • Backtracking occurs when the prediction made by the parser is incorrect. • Reach a step in parsing when terminal symbol in parse tree does not match current input symbol • Backtrack: go to a previous parser state and try the next option for the parse tree. • Options result from multiple rules for a nonterminal • Maintain a stack of parser states • Failure to parse sentence if no parser state matches the input

  9. Backtracking example 1 • S  NP VP • NP  DT N • VP  V • DT  a • N  flight • V  left | arrived • Sentence that will cause backtracking: A flight arrived

  10. When V is expanded, there are two choices; create two parser states • V  left | arrived • Most recent tree fails to match input. Backtrack to next most recent parser state, which succeeds. S S NP VP NP VP DT N V DT N V a flight left a flight arrived

  11. Backtracking example 2 S NP VP • S  NP VP • NP  DT N | N • VP  V | V NP • DT  a • N  flight | fuel • V  left | burns • Sentence that will cause backtracking: A flight burns fuel DT N V NP a flight burns N fuel

  12. Two choices for expanding VP. Most-recent state has VP  V. S S • (Leaving out some steps:) • Will reach a step where tree is expanded to V  burns. • Tree is complete, but there is still input remaining. • Failure to parse: backtrack to a previous parser state. NP VP NP VP DT N V NP DT N V a flight a flight burns

  13. Implemention of backtracking:stack of parser states • A parser state consists of: • A parse tree • The leftmost node to be expanded or matched • The position in the input string • A stack stores alternative parser states

  14. Initial parser state • A parser state consists of: • A parse tree • The leftmost node to be expanded or matched • The position in the input string • Initial state: (start symbol node, start symbol node, 0) • Visualization: ( , , 0 ) S S

  15. Example of a later parser state S • Suppose input sentence is: a flight burns fuel • Parser state: • Tree • Terminal node ‘flight’ • Input position: 1 • Parser will then match prediction of ‘flight’ with input symbol, which is ‘flight' NP VP DT N a flight

  16. Next parser state S • Suppose input sentence is: a flight burns fuel • Parser state: • Tree • Nonterminal node V • Input position: 2 • Second element of parser state is leftmost node that has not yet been expanded or matched NP VP DT N a flight

  17. Top-down parsing algorithm, describing manipulation of the stack • Initialize stack with (start node, start node, 0) • Repeat forever: • Remove topmost parser state from stack (pop). If there is none, the input sentence cannot be parsed. • If leftmost unexpanded/unmatched node is a nonterminal, create new parser states: look up rules in CFG, create corresponding trees, and put them on top of the stack. • If leftmost unexpanded/unmatched node is a terminal: • If it matches input symbol, create a new parser state: same tree, choose next unexpanded/unmatched node, advance input position. • If it doesn’t match input, reject this parser state. Next iteration will backtrack to parser state on top of stack.

  18. Parsing a nonterminal: create parser state(s) with expanded tree • CFG fragment: S  NP VP | VP • Put start state on stack. • Parse: • Pop parser state. • Node is S, which is a nonterminal. Create new parser states. (go right-to-left through rule alternatives when putting on stack, so that top state corresponds to first rule in CFG) Input: 0 S NP VP Input: 0 S S Input: 0 VP

  19. Parsing a terminal 1: match input, create state with new node and input position • Input: a flight burns • At an intermediate state of parsing. • Pop parser state. • Node is ‘a’, which is terminal. Try to match against input. • Successful match: create new parser state, with leftmost expanded/unmatched node re-computed, and input position increased. Put on stack. • Next iteration, top state refers to next position in input. S NP VP DT N a Input: 1 Input: 0

  20. Parsing a terminal 2: match input, create state with new node and input position • Input: a flight burns • At an intermediate state of parsing. • Pop parser state. • Node is ‘flight’, a terminal. Try to match against input. • Successful match: create new parser state, with leftmost expanded/unmatched node re-computed, and input position increased. Put on stack. • Next iteration, top state refers to next position in input. S NP VP DT N Input: 1 Input: 2 a flight

  21. Parsing a terminal 3: failure to match input S • Input: a house burns • At an intermediate state of parsing. • Pop parser state. • Node is ‘flight’, which is terminal. Try to match against input. • Failure to match: input symbol is ‘house’. • Next iteration, use previous parser state, which is now on top of the stack. NP VP DT N a flight Input: 1 Previous parser state

  22. Failure to parse • If the last parser state fails to parse, the input sentence cannot be parsed • “Fails to parse” • Input symbol does not match prediction • Input symbol does match prediction and tree is complete, but there are additional input symbols remaining Last parser state

  23. Outline • Top-down parsing • Short assignment #16

  24. Due 10/24 • Draw the stack sequence for a top-down parse of fuel burns using the below CFG. At each iteration, indicate whether the parser: • expands a nonterminal and creates new parser states • fails to match an input terminal and backtracks • matches an input terminal and constructs a new parser state, or • matches an input terminal and succeeds in the parse. • S  NP VP • NP  DT N | N • VP  V • DT  a • N  flight | fuel • V  left | burns

More Related