1 / 18

Statistical Decision-Tree Models for Parsing

Statistical Decision-Tree Models for Parsing. NLP lab, POSTECH 김 지 협. Contents. Abstract Introduction Decision-Tree Modeling SPATTER Parsing Statistical Parsing Models Decision-Tree Growing & Smoothing Decision-Tree Training Experiment Results Conclusion. Abstract.

Download Presentation

Statistical Decision-Tree Models for Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Decision-TreeModels for Parsing NLP lab, POSTECH 김 지 협

  2. Contents • Abstract • Introduction • Decision-Tree Modeling • SPATTER Parsing • Statistical Parsing Models • Decision-Tree Growing & Smoothing • Decision-Tree Training • Experiment Results • Conclusion CS730B

  3. Abstract • Syntactic NL parser: not adequate for highly-ambiguous large-vocabulary text (ex. Wall Street Journal) • Premises for develop a new parser • grammars too complex to develop manually for most domains • parsing models must rely heavily on contextual information • existing n-gram model: inadequate for parsing • SPATTER: a statistical parser based on decision-tree model • better than a grammar-based parser CS730B

  4. Introduction • Parsing as making a sequence of disambiguation decisions • The probability of a complete parse tree(T) of a sentence(S) • Automatically discovering the rules for disambiguation • Producing a parser without a complicated grammar • Long-distance lexical information is crucial to disambiguate interpretations accurately CS730B

  5. Decision-Tree Modeling • Comparison • Grammarian: two crucial tasks for parsing • identifying the features relevant to each decision • deciding which choice to select based on the values of the features • Decision-Tree: above 2 tasks + 3rd task • assigning a probability distribution to the possible choices, and providing a ranking system CS730B

  6. Continued • What is a Statistical Decision Tree? • A decision-making device assigning a probability to each of the possible choices based on the context of the decision • P ( f | h ) , where f : an element of the future vocabulary h : a history (the context of the decision) • The probability determined by asking a sequence of questions • i th question determined by the answers to the i - 1previous question • Example: Part-of-speech tagging problem ( Figure 1 ) CS730B

  7. Continued • Decision Trees vs. n-grams • Equivalent to an interpolated n - gram model in expressive power • Model Parameterization • n -gram model: • n -gram model can be represented by decision-tree model ( n-1 questions ) • Example: part-of-speech tagging CS730B

  8. Continued • Model Estimation • n-gram model CS730B

  9. Continued • decision-tree model • decision-tree model can be represented by interpolated n- gram CS730B

  10. Continued • Why use decision-tree? • As n grows, the parameter space for an n-gram model grows exponentially • On the other hand, the decision-tree learning algorithm increases the size of a model only as the training data allows • So, it can consider much contextual information CS730B

  11. SPATTER Parsing • SPATTER Representation • Parse: as a geometric pattern • 4 features in node: words, tags, labels, and extensions (Figure 3) • The Parsing Algorithm • Starting with the sentence’s words as leaves (Figure 3) • Gradually tagging, labeling, and extending nodes • Constraints • Bottom-up, left-to-right • No new node is constructed until its children completed • Using DWC(derivational window constraints), # of active nodes restricted • A single-rooted, labeled tree is constructed CS730B

  12. Statistical Parsing Models • The Tagging Model • The Extension Model • The Label Model • The Derivation Model • The Parsing Model CS 730B

  13. Decision-Tree Growing & Smoothing • 3 main models (tagging, extension, and label) • Dividing the training corpus into 2 sets: (90% for growing, 10% for smoothing) • Growing & Smoothing Algorithm • Figure 3.5 CS730B

  14. Decision-Tree Training • Parsing model can not be estimated by direct frequency counts because the model contains a hidden component: the derivation model • In the corpus, no information about orders of derivations • So, the training process must process discover which derivations assign higher probability to the parses • Forward-Backward Reestimation used CS730B

  15. Continued • Training Algorithm CS730B

  16. Experiment Results • IBM computer Manual • annotated by the University of Lancaster • 195 part-of-speech tags and 19 non-terminal labels • trained on 30,800 sentences, and tested on 1,473 new sentences • 0-crossing-brackets score • IBM’s rule-based, unification-style PCFG parse: 69% • SPATTER: 76% CS730B

  17. Continued • Wall Street Journal • To test ability to accurately parse a highly-ambiguous, large-vocabulary domain • Annotated in the Penn Treebank, version 2 • 46 part-of-speech tags, and 27 non-terminal labels • Trained on 40,000 sentences, and tested on 1,920 new sentences • Using PARSEVAL CS730B

  18. Conclusion • Large amounts of contextual information can be incorporated into a statistical model for by applying decision-tree learning algorithm • Automatically discovering rules are possible CS730B

More Related