1 / 79

Data-Oriented Parsing

Data-Oriented Parsing. Remko Scha Institute for Logic, Language and Computation University of Amsterdam. Overview The Big Picture (cognitive motivation) A simple Data-Oriented Parsing model Extended DOP models Psycholinguistics revisited Statistical considerations.

wirt
Download Presentation

Data-Oriented Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-Oriented Parsing Remko Scha Institute for Logic, Language and Computation University of Amsterdam

  2. Overview • The Big Picture (cognitive motivation) • A simple Data-Oriented Parsing model • Extended DOP models • Psycholinguistics revisited • Statistical considerations

  3. Data-Oriented Parsing: The Big Picture

  4. Data-Oriented Parsing: The Big Picture (1) The key to understanding cognition is: understanding perception.

  5. Data-Oriented Parsing: The Big Picture (1) The key to understanding cognition is: understanding visual Gestalt perception.

  6. Data-Oriented Parsing: The Big Picture (1) The key to understanding cognition is: understanding visual Gestalt perception. Conjecture: Language processing and "thinking" involve a metaphorical use of our Gestalt perception capability. R. Scha: "Wat is het medium van het denken?" In: M.B. In 't Veld & R. de Groot: Beelddenken en begripsdenken: een paradox? Utrecht: Agiel, 2005.

  7. Data-Oriented Parsing: The Big Picture (1) The key to understanding cognition is: understanding visual Gestalt perception. (2) All perceptual processes are based on detecting similarities and analogies with concrete past experiences.

  8. The Data-Oriented World View All interpretive processes are based on detecting similarities and analogies with concrete past experiences. E.g.: Visual Perception Music Perception Lexical Semantics Concept Formation.

  9. E.g.: The Data-Oriented Perspective on Lexical Semantics and Concept Formation. A concept = the extensional set of its previously experienced instances. Classifying new input under an existing concept = judging the input's similarity to these instances. Against: Explicit definitions Prototypes

  10. The Data-Oriented Perspective on Lexical Semantics and Concept Formation. A concept = the extensional set of its previously experienced instances. Classifying new input under an existing concept = judging the input's similarity to these instances. Against: Explicit definitions Prototypes Learning

  11. Data-Oriented Parsing Remko Scha: "Language Theory and Language Technology; Competence and Performance." LVVN Conference, 1989. Remko Scha: "Virtual Grammars and Creative Algorithms." Inaugural Lecture, 1991.

  12. Data-Oriented Parsing Processing new input utterances in terms of their similarities and analogies with previously experienced utterances.

  13. Language processing by analogy Was proposed already by"Bloomfield, Hockett, Paul, Saussure, Jespersen, and many others". But:"To attribute the creative aspect of language use to 'analogy' or 'grammatical patterns' is to use these terms in a completely metaphorical way, with no clear sense and with no relation to the technical usage of linguistic theory." (Chomsky 1966)

  14. Challenge: To work out a formally precise notion of "language processing by analogy".

  15. Challenge: To work out a formally precise notion of "language processing by analogy". A first step: Data-Oriented Parsing: Remember all utterances with their syntactic tree-structures. Analyse new input by recombining fragments of these tree structures.

  16. Data-Oriented Parsing Memory-based approach to syntactic parsing and disambiguation. Basic idea: use the subtrees from a syntactically annotated corpus directly as a stochastic grammar.

  17. Data-Oriented Parsing (DOP) Simplest version: DOP1. Annotated corpus defines Stochastic Tree Substitution Grammar Rens Bod: "Data-Oriented Parsing". Proc. COLING 1992, Nantes. Rens Bod: Enriching Linguistics with Statistics; Performance Models of Natural Language.Ph.D. Dissertation, University of Amsterdam, 1995. Rens Bod & Remko Scha: "Data-Oriented Language Processing." In: S. Young & G. Bloothooft: Corpus-Based Methods in Language Processing. Dordrecht: Kluwer, 1997.

  18. Data-Oriented Parsing (DOP) Simplest version: DOP1 (Bod 1992). Annotated corpus defines Stochastic Tree Substitution Grammar (Slides adapted from Guy De Pauw, University of Antwerp)

  19. Fragment Collection

  20. Generating "Peter killed the bear." Note: one parse has many derivations!

  21. An annotated corpus defines a Stochastic Tree Substitution Grammar Probability of a Derivation: Product of the Probabilities of the Subtrees

  22. An annotated corpus defines a Stochastic Tree Substitution Grammar Probability of a Derivation: Product of the Probabilities of the Subtrees Probability of a Parse: Sum of the Probabilities of its Derivations

  23. Probability of substituting a subtree tion a node: • the number of occurrences of a subtree ti, divided by the total number of occurrences of subtrees t with the same root node label as ti : • #(ti) / #(t : root(t) = root(ti) ) • Probability of a derivationt1°...°tn: • the product of the probabilities of the substitutions that it involves: • Pi #(ti) / #(t : root(t) = root(ti) ) • Probability of a parse-tree : • the sum of the probabilities of all derivations of that parse-tree : • Si Pj #(tij) / #(t : root(t) = root(tij) )

  24. An annotated corpus defines a Stochastic Tree Substitution Grammar Probability of a Derivation: Product of the Probabilities of the Subtrees Probability of a Parse: Sum of the Probabilities of its Derivations Disambiguation: Choose the Most Probable Parse-tree

  25. An annotated corpus defines a Stochastic Tree Substitution Grammar Q.: Does this work?

  26. An annotated corpus defines a Stochastic Tree Substitution Grammar Q.: Does this work? A.: Yes. Experiments on benchmark corpora yield very good results.

  27. An annotated corpus defines a Stochastic Tree Substitution Grammar Q.: Does this work? A.: Yes. Experiments on benchmark corpora yield very good results. • Much better than PCFG's. • Other successful methods (Collins et al.) enrich PCFG's to encode various kinds of non-local information.

  28. Beyond DOP1 • Computational issues • Linguistic issues • Psycholinguistic issues • Statistical issues

  29. Computational issues Part 1: the good news • TSG parsing can be based on the techniques of CFG-parsing, and inherits some of their properties. • Chart-parsing with Viterbi-optimization ("Semi-ring algorithms") is applicable for many useful purposes.

  30. Computational issues Part 1: the good news Semi-ring algorithms are applicable for many useful purposes. In O(n3) of sentence-length, we can: • Build a parse-forest. • Compute the Most Probable Derivation. • Select a random parse. • Compute a Monte-Carlo estimation of the Most Probable Parse.

  31. Computational issues Part 2: the bad news • Computing the Most Probable Parse is NP-complete (Sima'an). (Not a semi-ring algorithm.) • The grammar gets very large.

  32. Computational issuesPart 3: Solutions • Goodman (1996): Reduce STSG to (very large!) SCFG. • Non-probabilistic DOP: De Pauw, 1997: Parsing with the Largest Fragments • Non-probabilistic DOP: Bod, 2000: Parsing with the shortest derivation. (Good results on WSJ corpus.) • Compress the fragment-set. (Van der Werff, 2004: Use Minimum Description Length.)

  33. Linguistic issues

  34. Linguistic issues: Future work Scha (1990), about an imagined future DOP algorithm: It will be especially interesting to find out how such an algorithm can deal with complex syntactic phenomena such as "long distance movement". It is quite possible that an optimal matching algorithm does not operate exclusively on constructions which occur explicitly in the surface-structure; perhaps "transformations" (in the classical Chomskyan sense) play a role in the parsing process.

  35. Transformations "John likes Mary." "Mary is liked by John." "Does John like Mary?" "Who does John like?" "Who do you think John likes?" "Mary is the girl I think John likes."

  36. Transformations: Wh-movement, Passivization, Topicalization, Fronting, Scrambling, . . .? Move-Alfa?

  37. More powerful DOP models developed so far: • Kaplan & Bod: LFG-DOP (Based on Lexical-Functional Grammar) • Hoogweg: TIG-DOP (Based on Tree-Insertion Grammar; cf. Tree-Adjoining Grammar) • Sima'an: The Tree-Gram Model (Markov-processes on sister-nodes, conditioned on lexical heads)

  38. Psycholinguistics Revisited

  39. Psycholinguistic Considerations DOP is a performance model DOP defines syntactic probabilities of sentences and their analyses (against the background of a weak, overgenerating competence grammar: the definition of all formally possible sentence annotations).

  40. Psycholinguistic Considerations Does DOP account for performance phenomena?

  41. Psycholinguistic Considerations Probabilistic Disambiguation Psychological experiments consistently show that disambiguation preferences correlate with occurrence frequencies.

  42. Psycholinguistic Considerations The "Garden Path" Phenomenon "The horse raced past the barn "

  43. Psycholinguistic Considerations The "Garden Path" Phenomenon "The horse raced past the barn fell."

  44. Psycholinguistic Considerations The "Garden Path" Phenomenon "The horse raced past the barn fell." Plausible model: Incremental version of DOP: Analysis with very high probability kills analyses with low probability.

  45. Psycholinguistic Considerations Utterance Generation Cf. Kempen et al. (Leyden University) : (Non-probabilistic) generation mechanism which combines tree fragments at random.

  46. Psycholinguistic Considerations Grammaticality Judgements Cf. Stich: Priming of Grammaticality Judgements. Plausible model: DOP with "recency effect".

  47. Psycholinguistic Considerations Integration with semantics Cf. "Compositional Semantics" (Montague). Assume semantically annotated corpus.Cf. Van den Berg et al. Factoring in the probabilities of semantic subcategories: Cf. Bonnema.

  48. Psycholinguistic Considerations Language dynamics: Language Acquisition Language Change

More Related