1 / 23

Day 2: Pruning continued; begin competition models

Day 2: Pruning continued; begin competition models. Roger Levy University of Edinburgh & University of California – San Diego. Today. Concept from probability theory: marginalization Complete Jurafsky 1996: modeling online data Begin competition models. Marginalization.

nedra
Download Presentation

Day 2: Pruning continued; begin competition models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Day 2: Pruning continued;begin competition models Roger Levy University of Edinburgh & University of California – San Diego

  2. Today • Concept from probability theory: marginalization • Complete Jurafsky 1996: modeling online data • Begin competition models

  3. Marginalization • In many cases, a joint p.d. will be more “basic” than the raw distribution of any member variable • Imagine two dice with a weak spring attached • No independence → joint more basic • The resulting distribution over Y is known as the marginal distribution • Calculating P(Y) is called marginalizing over X

  4. Today • Concept from probability theory: marginalization • Complete Jurafsky 1996: modeling online data • Begin competition models

  5. Modeling online parsing • Does this sentence make sense? The complex houses married and single students and their families. • How about this one? The warehouse fires a dozen employees each year. • And this one? The warehouse fires destroyed all the buildings. • fires can be either a noun or a verb. So can houses: [NP The complex] [VP houses married and single students…]. • These are garden path sentences • Originally taken as some of the strongest evidence for serial processing by the human parser Frazier and Rayner 1987

  6. Limited parallel parsing • Full-serial: keep only one incremental interpretation • Full-parallel: keep all incremental interpretations • Limited parallel: keep some but not all interpretations • In a limited parallel model, garden-path effects can arise from the discarding of a needed interpretation [S [NP The complex] [VP houses…] …] discarded [S [NP The complex houses …] …] kept

  7. Modeling online parsing: garden paths • Pruning strategy for limited ranked-parallel processing • Each incremental analysis is ranked • Analyses falling below a threshold are discarded • In this framework, a model must characterize • The incremental analyses • The threshold for pruning • Jurafsky 1996: partial context-free parses as analyses • Probability ratio as pruning threshold • Ratio defined as P(I) : P(Ibest) • (Gibson 1991: complexity ratio for pruning threshold)

  8. Garden path models 1: N/V ambiguity • Each analysis is a partial PCFG tree • Tree prefix probability used for ranking of analysis • Partial rule probs marginalize over rule completions these nodes are actually still undergoing expansion *implications for granularity of structural analysis

  9. N/V ambiguity (2) • Partial CF tree analysis of the complex houses… • Analysis of houses as noun has much lower probability than analysis as verb (> 250:1) • Hypothesis: the low-ranking alternative is discarded

  10. N/V ambiguity (3) • Note that top-down vs. bottom-up questions are immediately implicated, in theory • Jurafsky includes the cost of generating the initial NP under the S • of course, it’s a small cost as P(S -> NP …) = 0.92 • If parsing were bottom-up, that cost would not have been explicitly calculated yet

  11. (that was) Garden path models II • The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) • From the valence + simple-constituency perspective, MC and RRC analyses differ in two places: The horse raced past the barn fell. p=0.14 p≈1 best intransitive: p=0.92 transitive valence: p=0.08

  12. Garden path models II (2) • 82 : 1 probability ratio means that lower-probability analysis is discarded • In contrast, some RRCs do not induce garden paths: • Here, found is preferentially transitive (0.62) • As a result, the probability ratio is much closer (≈ 4 : 1) • Conclusion within pruning theory: beam threshold is between 4 : 1 and 82 : 1 • (granularity issue: when exactly does probability cost of valence get paid??? c.f. the complex houses) The bird found in the room died. *note also that Jurafsky does not treat found as having POS ambiguity

  13. Notes on the probabilistic model • Jurafsky 1996 is a product-of-experts (PoE) model • Expert 1: the constituency model • Expert 2: the valence model • PoEs are flexible and easy to define, but… • The Jurafsky 1996 model is actually deficient (loses probability mass), due to relative frequency estimation

  14. sometimes approximated as Notes on the probabilistic model (2) • Jurafsky 1996 predated most work on lexicalized parsers (Collins 1999, Charniak 1997) • In a generative lexicalized parser, valence and constituency are often combined through decomposition & Markov assumptions, e.g., • The use of decomposition makes it easy to learn non-deficient models

  15. Jurafsky 1996 & pruning: main points • Syntactic comprehension is probabilistic • Offline preferences explained by syntactic + valence probabilities • Online garden-path results explained by same model, when beam search/pruning is assumed

  16. General issues • What is the granularity of incremental analysis? • In [NPthe complex houses], complex could be an adjective (=the houses are complex) • complex could also be a noun (=the houses of the complex) • Should these be distinguished, or combined? • When does valence probability cost get paid? • What is the criterion for abandoning an analysis? • Should the number of maintained analyses affect processing difficulty as well?

  17. Today • Concept from probability theory: marginalization • Complete Jurafsky 1996: modeling online data • Begin competition models

  18. General idea • Disambiguation: when different syntactic alternatives are available for a given partial input, each alternative receives support from multiple probabilistic information sources • Competition: the different alternatives compete with each other until one wins, and the duration of competition determines processing difficulty

  19. Origins of competition models • Parallel competition models of syntactic processing have their roots in lexical access research • Initial question: process of word recognition • are all meanings of a word simultaneously accessed? • or are only some (or one) meanings accessed? • Parallel vs. serial question, for lexical access

  20. Origins of competition models (2) • Testing access models: priming studies show that subordinate (= less frequent) meanings are accessed as well as dominant (=more frequent) meanings • Also, lexical decision studies show that more frequent meanings are accessed more quickly

  21. Origins of competition models (3) • Lexical ambiguity in reading: does the amount of time spent on a word reflect its degree of ambiguity? • Readers spend more time reading equibiased ambiguous words than non-equibiased ambiguous words (eye-tracking studies) • Different meanings compete with each other Of course the pitcher was often forgotten… ? ? Rayner and Duffy (1986); Duffy, Morris, and Rayner (1988)

  22. Competition in syntactic processing • Can this idea of competition be applied to online syntactic comprehension? • If so, then multiple interpretations of a partial input should compete with one another and slow down reading • does this mean increase difficulty of comprehension? • [compare with other types of difficulty, e.g., memory overload]

  23. Constraint types • Configurational bias: MV vs. RR • Thematic fit (initial NP to verb’s roles) • i.e., Plaus(verb,noun), ranging from • Bias of verb: simple past vs. past participle • i.e., P(past | verb)* • Support of by • i.e., P(MV | <verb,by>) [not conditioned on specific verb] • That these factors can affect processing in the MV/RR ambiguity is motivated by a variety of previous studies (MacDonald et al. 1993, Burgess et al. 1993, Trueswell et al. 1994 (c.f. Ferreira & Clifton 1986), Trueswell 1996) *technically not calculated this way, but this would be the rational reconstruction

More Related