190 likes | 494 Views
Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001. William Gregory Sakas
E N D
Introduction to Computational Natural Language LearningLinguistics 79400 (Under: Topics in Natural Language Processing)Computer Science 83000 (Under: Topics in Artificial Intelligence)The Graduate School of the City University of New YorkFall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York
Feasibility Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of work? Clark (1994, in press), Niyogi and Berwick (1996), Lightfoot (1989) (degree-0), Sakas(2000), Tesar and Smolensky (1996) and many PAC results concerning induction of FSA’s Feasibility measure(Sakas and Fodor, 2001) Near linear increase of the expected number of sentences consumed before a learner converges on the target grammar.
Infeasible Feasible Feature of learning model to be tested as feasible or not feasible.This could be # of parameters, average sentence length, etc.
A Feasibility Case Study : A three parameter domain (Gibson and Wexler, 1994) SV / VS - subject precedes verb / verb precedes subject VO / OV - verb precedes object / object precedes verb +V2 / -V2 - verb or aux must be in the second position in the sentence Sentences are strings of the symbols: S, V, 01, 02, aux, adv Allie will eat the birds S aux V O
Surprisingly, G&W’s simple 3-parameter domain presents nontrivial obstacles to several learning strategies, but the space is ultimately learnable (for example by a "blind-guess-learner" – see next slide). Big question: How will the learning process scale up in terms offeasibility as the number of parameters increases? Two problems for most acquisition strategies: 1) Ambiguity 2) Size of the domain
"Size of the domain" feasibility result:A Blind-guess learner succeeds only after consuminga number of sentences exponentially correlated with the number of parameters. • Blind-guesslearner (with oracle, no parsing, no error-driven constraint): • consume a sentence from the linguistic environment. • randomly select a setting for each parameter == randomly pick a grammar out of the domain. • if the settings indicate the target grammar stop, otherwise goto 1. A simple math result shows that the average number of trials required to achieve success* (in this case picking the target grammar) is: 1 / Probability of success. So if we can figure out the probability of picking the target grammar at step 3, we can model how many inputs the learner will require on average. * given that each trial has a possible outcome of either success or failure. For those technically minded, this is a hypergeometric distribution.
Probability of Success == Probability of picking the target grammar at step 3, as is calculated as: Pr(Success) = 1 / Num of grammars = 1 / 2 num of parameters Applying the math result for number of trials on average: 1 / (1 / 2 num of parameters) = 2 num of parameters If # Parameters = 30 then # Grammars = 230= 1,073,741,824 So, on average, over a billion sentences would need to be consumed by the blind-guess learner before attaining the target. The search space is huge!
Indicates a few ambiguous strings Cross-language ambiguity SV VO -V2 (English-like) S V S V O S V O1 O2 S AUX V S AUX V O S AUX V O1 O2 ADV S V ADV S V O ADV S V O1 O2 ADV S AUX V ADV S AUX V O ADV S AUX V O1 O2 SV OV +V2 (German-like) S V S V O O V S S V O2 O1 O1 V S O2 O2 V S O1 S AUX V S AUX O V O AUX S V S AUX O2 O1 V O1 AUX S O2 V O2 AUX S O1 V ADV V S ADV V S O ADV V S O2 O1 ADV AUX S V ADV AUX S O V ADV AUX S O2 O1 V
P&P acquisition: How to obtain informative feasibility results studying linguistically interesting domains with cognitively plausible learning algorithms?
But won't work for large domains. Create an input space for a linguistically plausible domain. -- simulations. (Briscoe (2000), Elman (1990, 1991,1996), Yang (200)) So, how to answer questions of feasibility as the number of grammars (exponentially) scales up? Answer: introduce some formal notions in order to abstract away from the specific linguistic content of the input data.
A formal approach • 1) formalize the learning process and input space • 2) use the formalization in a probabilistic model to empirically test the learner across a wide range of learning scenarios • Gives general data on the expected performance of acquisition algorithms. Can answer the question: • Given learner L, if the input space exhibits characteristics x, y and z, is feasible learning possible?
The Triggering Learning Algorithm -TLA (Gibson and Wexler, 1994) Searches the (huge) grammar space using local heuristics repeat until convergence: receive a string s from L(Gtarg) if it can be parsed by Gcurr , do nothing otherwise, pick a grammar that differs by one parameter value from the current grammar if this grammar parses the sentence, make it the current grammar, otherwise do nothing Error-driven SVC Greediness
TLA-minus (Sakas and Fodor, 2001) simplified TLA used for introducing a way of studying the effects of cross-language ambiguity through some simple formulas repeat until convergence: receive a string s from L(Gtarg) if it can be parsed by Gcurr , do nothing otherwise, pick a grammar at random (not necessarily one parameter distant) if this grammar parses the sentence, make it the current grammar, otherwise do nothing
Some variables for TLA- analysis: n = number of parameters in the domain r = number of relevant parameters in the domain a = number parameters expressed ambiguously by an arbitrary input sentence An example of n and r : Suppose there are 5 parameters in the domain under study and the learner is being exposed to a language for which either setting of parameter 5 (p5) is "correct." In this case n = 5 and r = 4 and,there are 2n = 25 = 32 grammars in the domain and, 2n-r= 25-4 = 21 = 2 grammars that the learner can consider a successful convergence.
Some variables for TLA- analysis (con't): An example of a: Suppose a sentence, s, is consumed by a learner where a successful parse can be acquired ONLY in the case that p1 and p2 set to 0 and 1, respectively, but the settings for p3 and p4 don't matter (and of course, as we said previously, the setting of p5 doesn't matter for the entire target language). In this case a =2.That is, 2 parameters are ambiguous - the settings of p3 and p4.
Some formulas for TLA- analysis: Recall: # grammars that make up the cluster of legit target grammars = 2n-r # grammars that are ambiguous = 2a Number of grammars that license an arbitrary sentence s : = 2n-r * 2a = 2a+n-r ** Probability that an arbitrary sentence s from the target language can be parsed by an arbitrary grammar: ** = # grammars that can parse / # grammars in the domain = (2n-r * 2a) / (2n) = 2a-r
Some formulas for TLA- analysis (con't): Recall:Number of grammars that license an arbitrary sentence s = 2a+n-r Probability that one of those grammars is in the 2n-r target clusters: = # grammars that make up the cluster divided by # grammars that can parse the current input = 2n-r / (2a+n-r) = 1 / 2a
A tree of choices and outcomes for the TLA- Blue is probability Red is action Black is outcome G' = = Gtarget and will be retained "in the limit." (This is due to the error-driven constraint) 1/2a Retain Gcurr and get another sentence. G' is one of the grammars in the target cluster 2a-r Gcurr can parse ≈ 2a-r Retain G'and get another sentence. Sentence G' can parse Gcurr can't parse Pick G' 1-2a-r G' can't parse ≈ 1-2a-r Forget about G'. Keep Gcurr (this is Greediness) and get another sentence.