1 / 76

Parameter Setting

Parameter Setting. Compounding Parameter. Resultatives Productive N-N Compounds American Sign Language   Austrooasiatic (Khmer)   Finno-Ugric   Germanic (German, English)   Japanese-Korean   Sino-Tibetan (Mandarin)  

zeke
Download Presentation

Parameter Setting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parameter Setting

  2. Compounding Parameter Resultatives Productive N-N Compounds American Sign Language   Austrooasiatic (Khmer)   Finno-Ugric   Germanic (German, English)   Japanese-Korean   Sino-Tibetan (Mandarin)   Tai (Thai)   Basque   Afroasiatic (Arabic, Hebrew)   Austronesian (Javanese)   Bantu (Lingala)   Romance (French, Spanish)   Slavic (Russian, Serbo-Croatian)  

  3. Developmental Evidence • Complex predicate properties argued to appear as a group in English children’s spontaneous speech (Stromswold & Snyder 1997) • Appearance of N-N compounding is good predictor of appearance of verb particle constructions and other complex predicate constructions - even after partialing out contributions of • Age of reaching MLU 2.5 • Production of lexical N-N compounds • Production of adjective-noun combinations • Correlations are remarkably good

  4. Sample Learning Problems • Null-subject parameter • V2 • Long-distance reflexives • Scope inversion • Argument structure alternations • Wh-scope marking • Complex predicates/noun-noun compounding • Condition C vs. Language-specific constraints • One-substitution • Preposition stranding • Disjunction • Subjacency parameteretc.

  5. Classic Parameter Setting • Language learning is making pre-determined choices based on ‘triggers’ whose form is known in advance • Challenge I: encoding and identifying reliable triggers • Challenge II: overgeneralization • Challenge III: lexically bound parameters

  6. The Concern… • “From its inception, UG has been regarded as that which makes acquisition possible. But for lack of a thriving UG-based account of acquisition, UG has come to be regarded instead as an irrelevance or even an impediment. It is clearly open to the taunt: ‘All that innate knowledge, only a few facts to learn, yet you can’t say how!’” (Fodor & Sakas, 2004, p.8)

  7. Input utterance corpus Analyze Input Action

  8. Gibson & Wexler (1994) • Triggering Learning Algorithm • Learner starts with random set of parameter values • For each sentence, attempts to parse sentence using current settings • If parse fails using current settings, change one parameter value and attempt re-parsing • If re-parsing succeeds, change grammar to new parameter setting B + - + Si A -

  9. Gibson & Wexler (1994) • Triggering Learning Algorithm • Learner starts with random set of parameter values • For each sentence, attempts to parse sentence using current settings • If parse fails using current settings, change one parameter value and attempt re-parsing • If re-parsing succeeds, change grammar to new parameter setting B + - Single Value Constraint + Si A Greediness Constraint -

  10. Gibson & Wexler (1994) • For an extremely simple 2-parameter space, the learning task is easy - any starting point, any destination • Triggers do not really exist in this model VO VOS SVO OV SOV OVS SV VS

  11. Gibson & Wexler (1994) • Extending the space to 3-parameters • There are non-adjacent grammars • There are local maxima, where current grammar and all neighbors fail VO VOS SVO +V2 OV SOV OVS -V2 SV VS

  12. Gibson & Wexler (1994) • Extending the space to 3-parameters • There are non-adjacent grammars • There are local maxima, where current grammar and all neighbors fail String:Adv S V O VO VOS 8 SVO +V2 OV SOV OVS -V2 SV VS

  13. Gibson & Wexler (1994) • Extending the space to 3-parameters • There are non-adjacent grammars • There are local maxima, where current grammar and all neighbors fail • All local maxima involve impossibility of retracting a +V2 hypothesis String:Adv S V O VO VOS 8 SVO +V2 OV SOV OVS -V2 SV VS

  14. Input utterance corpus Analyze Input Action

  15. Sample Learning Problems • Null-subject parameter • V2 • Long-distance reflexives • Scope inversion • Argument structure alternations • Wh-scope marking • Complex predicates/noun-noun compounding • Condition C vs. Language-specific constraints • One-substitution • Preposition stranding • Disjunction • Subjacency parameteretc.

  16. Gibson & Wexler (1994) • Solutions to local maxima problem • #1: Initial state is V2: default to [-V2] S: unsetO: unset • #2: Extrinsic ordering String:Adv S V O VO VOS 8 SVO +V2 OV SOV OVS -V2 SV VS

  17. Fodor (1998) • Unambiguous Triggers • Local maxima in TLA result from the use of ‘ambiguous triggers’ • If learning only occurs based on unambiguous triggers, local maxima should be avoided • Difficulties • How to identify unambiguous triggers? • Unambiguous trigger can only be parsed by a grammar that includes value Pi of parameter P, and by no grammars that include value Pj. • A parameter space with 20 binary parameters implies 220 parses for any sentence.

  18. Fodor (1998) • Ambiguous Trigger • SVO can be analyzed by at least 5 of the 8 grammars in G&W’s parameter space 8 8 8 VO VOS 8 SVO +V2 OV SOV OVS -V2

  19. Fodor (1998) • Structural Triggers Learner (STL) • Parameters are treelets • Learner attempts to parse input sentences using supergrammar, that contains treelets for all values of all unset parameters, e.g., 40 treelets for 20 unset binary parameters. • Algorithm • #1: Adopt a parameter value/trigger structure if and only if it occurs as a part of every complete well-formed phrase marker assigned to an input sentence by the parser using the supergrammar. • #2: Adopt a parameter value/trigger structure if and only if it occurs as a part of a unique complete well-formed phrase marker assigned to the input by the parser using the supergrammar.

  20. Fodor (1998) • Structural Triggers Learner • If a sentence is structurally ambiguous, it is taken to be uninformative (slightly wasteful, but conservative) • Unable to take advantage of collectively unambiguous sets of sentences, e.g. SVO and OVS, which entail [+V2] • Still unclear (to me) how it manages its parsing task

  21. Input utterance corpus supergrammar grammar Generate Parses Select Parses Action

  22. Charles Yang

  23. Competition Model • 2 grammars - start with even strength, both get credit for success, one gets punished for failure • Each grammar is chosen for parsing/production as a function of its current strength • Must be that increasing Pi for one grammar decreases Pj for other grammars • Is it the case that the presence of some punishment will guarantee that a grammar will, over time, always fail to survive?

  24. Competition Model • Upon the presence of an input datum s, the child • Selects a grammar Gi with the probability pi. • Analyzes s with Gi. • Updates competition • If successful, reward Gi by increasing pi. • Otherwise, punish Gi by decreasing pi. • This implies that change only occurs when a selected grammar succeeds or fails

  25. Competition Model • Linear reward-penalty scheme (LR-P, Bush & Mosteller, 1951) • Given an input sentence s, the learner selects a grammar Gi with probability pi, from the population of N possible grammars. • If Gi --> s then • p’i = pi + (1 - pi) • p’j = (1 - )pj • If Gi -/-> s then • p’i = (1 - )pi • p’j = /(N - 1) + (1 - )pj

  26. Competition Model • Linear reward-penalty scheme (LR-P, Bush & Mosteller, 1951) • Given an input sentence s, the learner selects a grammar Gi with probability pi, from the population of N possible grammars. • If Gi --> s then • p’i = pi + (1 - pi) • p’j = (1 - )pj • If Gi -/-> s then • p’i = (1 - )pi • p’j = /(N - 1) + (1 - )pj The value p is a probability for theentire grammar. This rule suggests that all grammarsare affected on each trial, not onlythe grammar that is currently beingtested. Other discussions in the bookdo not clearly make reference to this.

  27. From Grammars to Parameters • Number of Grammars problem • Space with n parameters implies at least 2n grammars (e.g. 240 is ~1 trillion) • Only one grammar is used at a time, so implies very slow convergence • Competition among Parameter Values • How does this work? • Each trial involves selection of a vector of parameters[0, 1, 1, 0, 0, 1, 1, 1, …] • Success or failure rewards/punishes all parameters, regardless of their complicity in the outcome of the trial • Naïve Parameter Learning model (NPL) may reward incorrect parameter values as hitchhikers, or punish correct parameter values as accomplices.

  28. Avoiding Accomplices • How could ill-placed reward/punishment be avoided? • Identify which parameters are responsible for success/failure on a given trial • Parameters associated with lexical items/treelets

  29. Empirical Predictions • HypothesisTime to settle upon target grammar is a function of the frequency of sentences that punish the competitor grammars • First-pass assumptions • Learning rate set low, so many occurrences needed to lead to decisive changes • Similar amount of input needed to eliminate all competitors

  30. Empirical Predictions • ±wh-movement • Any occurrence of overt wh-movement punishes a [-wh-mvt] grammar • Wh-questions are highly frequent in input to English-speaking children (~30% estimate!) • [±wh-mvt] parameter should be set very early • This applies to clear-cut contrast between English and Chinese, but … • French: [+wh-movement] and lots of wh-in-situ • Japanese: [-wh-movement] plus scrambling

  31. Empirical Predictions • Verb-raising • Reported to be set accurately in speech of French children (Pierce, 1992)

  32. French: Two Verb Positions a. Il ne voit pas le canard he sees not the duck b. *Il ne pas voit le canard he not sees the duck c. *Il veut ne voir pas le canard he wants to.see not the duck d. Il veut ne pas voir le canard he wants not to.see the duck

  33. French: Two Verb Positions a. Il ne voit pas le canard he sees not the duck b. *Il ne pas voit le canard he not sees the duck c. *Il veut ne voir pas le canard he wants to.see not the duck d. Il veut ne pas voir le canard he wants not to.see the duck agreeing (i.e. finite) forms precede pas non-agreeing (i.e. infinitive) forms follow pas

  34. French Children’s Speech • Verb forms: correct or default (infinitive) • Verb position changes with verb form • Just like adults finite infinitive 127 119 (Pierce, 1992)

  35. French Children’s Speech • Verb forms: correct or default (infinitive) • Verb position changes with verb form • Just like adults 122 V-neg neg-V 124 (Pierce, 1992)

  36. French Children’s Speech • Verb forms: correct or default (infinitive) • Verb position changes with verb form • Just like adults finite infinitive 121 1 V-neg neg-V 6 118 (Pierce, 1992)

  37. Empirical Predictions • Verb-raising • Reported to be set accurately in speech of French children (Pierce, 1992) • Crucial evidence… verb neg/adv … • Estimated frequency in adult speech: ~7% • This frequency set as an operational definition of sufficiently frequent for early mastery (early 2’s)

  38. Empirical Predictions • Verb-second • Classic argument: V2 is mastered very early by German/Dutch speaking children (Poeppel & Wexler, 1993, Hageman, 1995) • Yang’s challenges • Crucial input is infrequent • Claims of early mastery are exaggerated

  39. Two Verb Positions a. Ich sah den Mann I saw the man b. Den Mann sah ich the man saw I c. Ich will [den Mann sehen] I want the man to.see[inf] d. Den Mann will ich [sehen] the man want I to.see[inf]

  40. Two Verb Positions a. Ich sah den Mann I saw the man b. Den Mann sah ich the man saw I c. Ich will [den Mann sehen] I want the man to.see[inf] d. Den Mann will ich [sehen] the man want I to.see[inf] agreeing verbs (i.e. finite verbs) appear in second position non-agreeing verbs (i.e. infinitive verbs) appear in final position

  41. German Children’s Speech • Verb forms: correct or default (infinitive) • Verb position changes with verb form • Just like adults finite infinitive 208 43 Andreas, age 2;2(Poeppel & Wexler, 1993)

  42. German Children’s Speech • Verb forms: correct or default (infinitive) • Verb position changes with verb form • Just like adults 203 V-2 V-final 48 Andreas, age 2;2(Poeppel & Wexler, 1993)

  43. German Children’s Speech • Verb forms: correct or default (infinitive) • Verb position changes with verb form • Just like adults finite infinitive 197 6 V-2 V-final 11 37 Andreas, age 2;2(Poeppel & Wexler, 1993)

  44. Empirical Predictions • Cross-language word orders • Dutch: SVO, XVSO, OVS • Hebrew: SVO, XVSO, VSO • English: SVO, XSVO • Irish: VSO, XVSO • Hixkaryana: OVS, XOVS • Order of elimination • Frequent SVO input quickly eliminates #4 and #5 • Relatively frequent XVSO input eliminates #3 • OVS is needed to eliminate #2 - only ~1.3% of input

  45. Empirical Predictions • But what about the classic findings by Poeppel & Wexler (etc.)? • They show mastery of V-raising, not mastery of V-2 • Yang argues that early Dutch shows lots of V-1 sentences, due to the presence of a Hebrew grammar (based on Hein corpus)e.g. week ik niet [know I not]

  46. Empirical Predictions • Early Argument Drop • Resuscitates idea that early argument omission in English is due to mis-set parameter • Overt expletive subjects (‘there’) ~1.2% frequency in input

  47. Null Subjects • Child English • Eat cookie. • Hyams (1986) • English children have an Italian setting of null-subject parameter • Trigger for change: expletive subjects • Valian (1991) • Usage of English children is different from Italian children (proportion) • Wang (1992) • Usage of English children is different from Chinese children (null objects)

  48. Empirical Predictions • Early Argument Drop • Resuscitates idea that early argument omission in English is due to mis-set parameter • Wang et al. (1992) argument about Chinese was based on mismatch in absolute frequencies between Chinese & English learners • Yang: if incorrect grammar is used probabilistically, then absolute frequency match not expected - rather, ratios should match • Ratio of null-subjects and null-objects is similar in Chinese and English learners • Like Chinese, English learners do not produce wh-obj pro V?

  49. Empirical Predictions • Null Subject Parameter Setting • Italian environment • English [- null subject] setting killed off early, due to presence of large amount of contradictory input • Italian children should exhibit an adultlike profile very early • English environment • Italian [+ null subject] setting killed off more slowly, since contradictory input is much rarer (expletive subjects) • The fact that null subjects are rare in the input seems to play no role

More Related