1 / 94

LING / C SC 439/539 Statistical Natural Language Processing

LING / C SC 439/539 Statistical Natural Language Processing. Lecture 28 4 /29/2013. Recommended reading.

sharne
Download Presentation

LING / C SC 439/539 Statistical Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING / C SC 439/539Statistical Natural Language Processing • Lecture 28 • 4/29/2013

  2. Recommended reading • Peter Grünwald. 1996. A minimum description length approach to grammar inference. In Symbolic, Connectionist and Statistical Approaches to Learning for Natural Language Processing (editors S. Wermter, E. Riloff, G. Scheler), pp. 203-216; Lecture Notes in Artificial Intelligence no. 1040. Springer Verlag, Berlin, Germany. • Colin Phillips. 2004. Three Benchmarks for Distributional Approaches to Natural Language Syntax. In R. Zanuttini, H. Campos, E. Herburger, & P. Portner (eds.), Negation, Tense, and Clausal Architecture: Cross-linguistic Investigations. Georgetown University Press. • Janet Dean Fodor. 1998. Unambiguous triggers. Linguistic Inquiry 29(1), 1-36. • Charles Yang. 2004. Universal Grammar, statistics, or both. Trends in Cognitive Sciences. 451-456.

  3. Other recommended reading: books • Noam Chomsky. 1965. Aspects of the Theory of Syntax • William O’Grady. 1997. Syntactic Development • Mark Baker. 2001. The Atoms of Language • Charles Yang. 2002. Knowledge and Learning in Natural Language • Charles Yang. 2006. The Infinite Gift • Maria Teresa Guasti. 2002. Language Acquisition: The Growth of Grammar

  4. Credits • Some slides are borrowed or adapted from: • Paul Hagstrom http://www.bu.edu/linguistics/UG/hagstrom/ • Charles Yang http://www.ling.upenn.edu/~ycharles/

  5. Outline • Grammar induction • Grammar induction algorithms • Children’s acquisition of syntax, and poverty of the stimulus • Principles and parameters: principles • Principles and parameters: parameters • Grammatical development in children • Learning syntax through parameter setting

  6. Grammar induction • Given a finite sample of strings from some language L, what is the grammar G that was most likely to produce that language? • Difficulties: • Generalize from a sample of strings to the underlying language • Sparse data • Poverty of the stimulus (later section)

  7. CFG induction example • Sentence 1: The boy meets the girl • Sentence 2: The girl meets the boy • Underlying CFG: S  NP VP NP  DT N DT  the N  boy | girl VP  V N V  meets

  8. A few possibilities for an induced CFG • G1: S  NP VP NP  DT N DT  the N  boy | girl VP  V N V  meets • G2: S  C1 C2 C1  C3 C4 C3  the C4  boy | girl C2  C5 C4 C5  meets • G3: S  C1 C2 C3 C4 C5 C1  the C2  boy | girl C3  meets C4  the C5  boy | girl • G4: S  C1 C2 C1  the C2  boy C3 | girl C3 C3  meets C4 C4  the C5 C5  boy | girl

  9. Issues in grammar induction • Format of possible grammars: • Type of grammar: regular, context-free, dependency, etc. • Probabilistic / non-probabilistic • Nativism vs. emergence: • Are the labels that linguists use a result of mapping induced structures onto internal representations, or are they emergent properties of the grammar? • Example, difference between these perspectives: • algorithm is specifically looking for an NP in the input, or, • algorithm discovers C35, which has the properties of what linguists call an NP

  10. Issues in grammar induction • Treat as an A.I. search problem 1. Hypothesis space of possible grammars 2. Algorithm to search space of grammars 3. Evaluation criteria to decide between grammars • Input: positive evidence only, or also include negative evidence? • Positive: strings in the language • Negative: strings not in the language

  11. Outline • Grammar induction • Grammar induction algorithms • Children’s acquisition of syntax, and poverty of the stimulus • Principles and parameters: principles • Principles and parameters: parameters • Grammatical development in children • Learning syntax through parameter setting

  12. Example paper: Grünwald1996 • Peter Grünwald. A minimum description length approach to grammar inference In Symbolic, Connectionist and Statistical Approaches to Learning for Natural Language Processing (editors S. Wermter, E. Riloff, G. Scheler), pages 203-216; Lecture Notes in Artificial Intelligence no. 1040. Springer Verlag, Berlin, Germany, 1996. • http://homepages.cwi.nl/~pdg/ • Algorithm to induce context-free grammars from text • Doesn’t work so well… • There are other grammar induction papers, but I have never read one that I have found to be satisfactory.

  13. Input and output • Input: text corpus • Output: “Context-free grammar” • (It’s actually a regular grammar; no center-recursion) • Has multiple “start symbol” nonterminals • Words in a sentence may be generated by multiple “start symbols”, because the induced grammar might not generate the entire sentence • Example: [The quick] [brown fox] [jumped over the lazy dog]

  14. Description of algorithm

  15. Space of possible grammars • Initial grammar • For every word wi, create a rule i wi • Bottom-up merging process • For classes ciand cjin the grammar: • Union them into a new class ck ci | ci, or • Concatenate them into a new class ck cicj • Space of possible unions / concatenations describes range of possible grammars

  16. Example • Training corpus: • The dog is big • The dog is fat • Initial grammar: C1  the C2  dog C3  is C4  big C5  fat • First concatenate C1 and C2: C1  the C2  dog C6  C1 C2 C3  is C4  big C5  fat • Then union C4 and C5: C1  the C2  dog C6  C1 C2 C3  is C4  big C5  fat C7  C4 | C5

  17. Compare grammars through MDL Initial grammar Union or concatenation Alternative grammar . . . Alternative grammar Alternative grammar Minimal Description Length Best grammar

  18. Need to figure out how to encode a grammar and a corpus • MDL applied to grammar induction: Minimize: length of description of grammar + description of corpus according to the grammar • Following slides contain one possibility • (MDL formulas are ad-hoc) • Grunwalddoes something else more complicated • Grunwald’s master’s thesis does something else

  19. # of bits to encode a grammar • Suppose this is your grammar C1  C2 C3 C2  w1 C2  w5 C3  C4 C3  w4 w6 C4  w3 • For a particular rule: • Choosing which rule involves: • Choose LHS nonterminal • Choose RHS given the LHS • # of bits to encode a rule = - log2 p(LHS, RHS) = - log2 p(LHS) - Σlog2 p(RHS | LHS) • # of bits to encode grammar = Σrules # of bits for each rule

  20. # of bits to encode a corpus according to a grammar • Corpus = w1 w2 w3 • Encode using this grammar: C1  C2 C3 C2  w1 C2  w5 C3  C4 C3  w4 w6 C4  w3 • w1 w2 w3 is generated through rules 1, 2, 3, 4. p(w1 w2 w3) = p(C1)*p(C2 C3|C1)*p(w1|C2)*p(C4|C3)*p(w3|C4) • # of bits to encode w1 w2 w3 = - log2 p(C1) - log2p(C2 C3|C1) - log2p(w1|C2) - log2p(C4|C3) - log2p(w3|C4)

  21. Grunwald’s experiments • Brown corpus • Choose sentences that only consist of words that are among the 10,000 most-frequent words in the corpus

  22. Experiment 1: union only(no concatenation)

  23. Description Length over merging iterations

  24. Experiment 2: concatenation • Doesn’t work • Takes too long • No results reported • “We do not arrive at very good rules” • “it should be noted here that in experiments with toy grammars much better grammar rules were formed.”

  25. Outline • Grammar induction • Grammar induction algorithms • Children’s acquisition of syntax, and poverty of the stimulus • Principles and parameters: principles • Principles and parameters: parameters • Grammatical development in children • Learning syntax through parameter setting

  26. Children as grammar inducers • Let’s now consider grammar induction as a cognitive science problem. • Look at what kids say. • Look at whether the input is sufficient to learn an adult grammar for a language.

  27. 1. Kids say the darndest things • If children were acquiring grammars by string pattern generalization, you would not expect them to speak (generate) sentences not in the language. • Whether through Grunwald’s procedure, or some other one; details do not matter • But children say things that are not in the adult grammar. • At the level of individual words • At the level of syntactic constructions

  28. Limited influence of parental feedback • Parents often correct what their children say… but it doesn’t work Billy: Mommy: Billy:

  29. From Braime (1971) • Want other one spoon, daddy. • You mean, you want the other spoon. • Yes, I want other one spoon, please Daddy. • Can you say ‘the other spoon’? • Other…one…spoon • Say ‘other’ • Other • ‘Spoon’ • Spoon • ‘Other spoon’ • Other…spoon. Now give me other one spoon?

  30. Children also do not receive negative evidence for general grammatical principles • Negative evidence (from parents) doesn’t concern core grammatical principles such as phrase structure, headedness, movement • For example, no parent says: • “You can’t question a subject in a complement embedded with that” • You can’t use a proper name if it’s c-commanded by something coindexed with it.”

  31. 2. Grammatical knowledge and Poverty of the Stimulus • Adults have intuitions about the grammar of their language. • Example • John ate peas and carrots. • What did John eat ___ ? • Now suppose the speaker knows that John ate peas, and asks for what else John ate with the peas: *What did John eat peas and ___ ? • How do we know that the last sentence is ungrammatical? We never see examples like this in the input. • Other examples of ungrammatical sentences: • Which book did she review __ without reading __ ? • *She reviewed the book without reading __

  32. Poverty of the Stimulus • Argument of the Poverty of the stimulus: • Adults’ knowledge of the structure of their language cannot be accounted for through simple learning mechanisms like string pattern generalization • Therefore humans must have knowledge of grammar that they are born with (nativist, rationalist) • Language acquisition involves both external data (empirical aspect) and innate knowledge • Innate knowledge explains fast rate of language acquisition, especially given limited quantity of observed data • Phillips (2004): grammar induction algorithms should be judged according to whether they model human intuitions • Very high standards! • But otherwise will not be able to convince linguists

  33. Chomsky’s degrees of adequacy, in accounting for a language • Observational adequacy • Theory accounts for the observed forms of a language • Not interesting: could be a list of sentences • Descriptive adequacy • Theory accounts for the observed forms of a language • Theory explains intuitions of native speakers about their language • Utilizes abstract grammatical structures • Distinguishes possible from impossible structures • Explanatory adequacy (highest goal) • Theory accounts for the observed forms of a language • Theory explains intuitions of native speakers about their language • Explains how that knowledge of language can be acquired by a learner

  34. Outline • Grammar induction • Grammar induction algorithms • Children’s acquisition of syntax, and poverty of the stimulus • Principles and parameters: principles • Principles and parameters: parameters • Grammatical development in children • Learning syntax through parameter setting

  35. Universal Grammar (UG) • The set of principles / parameters / rules / constraints governing language structure • Common to all human languages • Determines the set of possible human languages • Explains linguistic variation • Innate, unconscious knowledge • Modeled by a linguistic theory such as Principles & Parameters or Minimalism

  36. UG and language acquisition • Language acquisition with UG = a specific setting of UG parameters + a lexicon: phonemes, morphemes, words, and their argument structure, semantics, etc. • During the critical period of language acquisition, the data encountered (primary linguistic data) is used to: • Set parameters of UG • Acquire the lexicon

  37. Principles and Parameters • A specific theory of UG • (same as Government and Binding Theory) • Principles: aspects of linguistic structure that are invariant across languages • Parameters: aspects of linguistic structure that differ across languages • All languages share the same principles, but may differ in their parameter settings (and their vocabulary)

  38. Principles for phrase structure • Motivation: there are many redundant phrase structure rules: VP  V NP, PP  P NP, AP  A NP, etc. • X-bar theory and the principle of Endocentricity • Every phrase has a head • XP is the maximal projection of the head X • Rules out structures such as: • NP  ADJ P • PP  V NP • Benefit: • X-bar theory captures commonalities between rules • There is no explicit CFG in UG

  39. Principles for movement • Explain relationship between a declarative sentence and its question variant: • The student was sleeping • Was the student __ sleeping? • John can solve this problem • Which problem can John solve __? • Theory of Movement: • Questioned constituent is displaced • Other structures in the sentence may be modified also • Constraints on movement

  40. Universal constraints on movement • Coordinate structure constraint: • John ate [bagels and what]NP for lunch? • *What did John eat bagels and ___ for lunch? • Conjoined NP forms an “island” that cannot be extracted from • Relative clause island • John saw the dog that ate pizza. • John saw [the dog that ate what]NP • *What did John see the dog that ate ___ ? • A relative clause also forms an “island” for movement

  41. Outline • Grammar induction • Grammar induction algorithms • Children’s acquisition of syntax, and poverty of the stimulus • Principles and parameters: principles • Principles and parameters: parameters • Grammatical development in children • Learning syntax through parameter setting

  42. Parameters and linguistic variation • Languages are superficially different • All languages share a core grammatical structure, as determined by the principles / rules / constraints of UG • Primary differences between languages are in parameter settings • (Each language also has its own vocabulary, but this is a superficial difference)

  43. Every combination of parameter settings determines a unique language type

  44. Japanese vs. English • Head-first: English-type language Kazuate sushi to Tokyo • Head-last: Japanese-type language Kazu sushi ate Tokyo to • In terms of phrase structure rules: • English: VP  V NP PP  P NP • Japanese: VP  NP V PP  NP P

  45. Head direction parameter • A head-first language applies the headfirst rule to all of its phrases: NPs, VPs, etc. • A head-last language applies the head-last rule to all of its phrases: NPs, VPs, etc. English Japanese

  46. Wh- movement parameter • Parameter for presence/absence of wh- movement in a language • Wh- movement occurs in English • Wh- movement does not occur in Korean • Korean is wh- in situ Ne-nun [Mary-ka enutayhak-eykat-tako] sayngkakha-ni you Mary which college went that think “Which college do you think that Mary went to?”

  47. Verb movement parameter • French: V raises to aux • English: aux lowers to V

  48. Null Subject Parameter • Italian allows null subjects but English doesn’t: • I ate shepherds pie. • Ø Ho mangiatoil risotto allamilanese. • Italian allows Pro-drop (omit pronoun): • Mary speaks English very well because she was born in the US. • Vitoparlal’italiano molto bene ma Ø e natoneglistatiuniti. • Italian speakers can figure out who the subject is, because of inflection on the verb.

More Related