1 / 20

William Gregory Sakas Hunter College, Department of Computer Science

Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001. William Gregory Sakas

habib
Download Presentation

William Gregory Sakas Hunter College, Department of Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Computational Natural Language LearningLinguistics 79400 (Under: Topics in Natural Language Processing)Computer Science 83000 (Under: Topics in Artificial Intelligence)The Graduate School of the City University of New YorkFall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York

  2. Meeting 1 (Overview): • Today’s agenda: • Why computationally model language learning? • Linguistics, state space search and definitions • Early (classic) computational approaches • Gold - language can’t be learned theorem • Angluin - Oh yes it can • Artificial Neural Networks: an Introduction • Tlearn software demonstration (if time)

  3. Explicitness of the computational model can ground linguistic theories - • "...it may be necessary to find out how language learning could work in order for the developmental data to tell us how is does work." (Pinker, 1979) Can natural language grammar be modeled by X? only if X is both descriptively adequate (predicts perceived linguistic phenomena) and explanatorily adequate (explains how the phenomena come to be) (Bertolo, MIT Encyclopedia of Cognitive Science) If a computational model demonstrates that some formally defined class of models cannot be learned, X had better fall outside of that class regardless of its descriptive adequacy.

  4. Generative Linguistics phrase structure rule (PS) grammar - a formalism based on rewrite rules which are recursively applied to yield the structure of an utterance. transformational grammar - sentences have (at least) two phrase structures: an original or base-generated structure and the final or surface structure. A transformation is a mapping from one phrase structure to another. principles and parameters - all languages share the same principles with a finite number of sharply delineated differences or parameters. NON-generative linguistics. See Elman, Language as a dynamical system.

  5. G2 G0 G5 G6 G3 Gtarg G5 G4 • Syntax acquisition can be viewed as a state space search • nodes represent grammars including a start state and a target state. • arcs represent a possible change from one hypothesized grammar to another.

  6. sL(G0) sL(G1) sL(G2) sL(G3) sL(Gtarg) sL(G0) sL(G1) sL(G2) sL(G3) Gtarg G1 G2 G3 G0 Gold’s grammar enumeration learner (1967) where s is a function that returns the next sentence from the input sample being fed to the learner, and L(Gi) is the language generated by grammar Gi. • Two points: • The learner is error-driven • error-driven learners converge on the target in the limit

  7. Learnability- Under what conditions is learning possible? Feasibility - Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of work? A class of grammars H) is learnable iff a learner such that  G H,   (fair) generable by G, the learner converges on G.

  8. An early learnability result (Gold, 1967) Exposed to input strings of an arbitrary target language Ltarg = L(Gtarg ) where Gtarg  H, it is impossible to guarantee that a learner can converge on Gtarg if H is any class in the Chomsky hierarchy. Moreover, no learner is uniformly faster than one that executes simple error-driven enumeration of languages. H - The hypothesis space is the set of grammars that may be hypothesized by the learner

  9. "Eated." L(Gm) "Walked." L(Go) L(Gk) "She eated." L(Gi) "Walked she." "She walked." "She ate." The Overgeneralization Hazard

  10. If H = An infinite language L(Gk)  an infinite setof included finite languages then H is unlearnable L(Gi) L(Gi) H  Lreg Lreg is unlearnable Lreg Lcf Lcs Lre No class of languages in the chomsky hierarchy is learnable

  11. Gold’s Enumeration Learner is as fast as any other learner Assume there exists a rival learner that converges earlier than the enumeration learner. The rival arrives at the target at time i, The enumerator at time j (i < j). At time j, the enumeration learner had to be conjecturingSOME grammar consistent with the input up to that point.If the target had happened to be that grammar, the enumeratorwould have been correct and the rival incorrect.Thus, for every language that the rival converges on fasterthan the enumerator, there is a language for which the reverse is true.

  12. Corollary: Language just can't be learned ;-)

  13. Lre Lcs Lcf Lhuman Lreg • The class of human languages must intersect the Chomsky Hierarchy so that it does not coincide with any other class that properly includes any class in the hierarchy.

  14. Angluin’s Theorem (1980) A class of grammars H is learnable iff for every languageLi = L(Gi), Gi  H there exists a finite subset D such thatno other language L(G), G H includes D and is included in Li. if this language can be generated bya grammar in H, H is not learnable! L(Gi) L(G) D

  15. Artificial Neural Networks: A brief introduction a) fully recurrent b) feedforward c) multi-component

  16. bias node Input activations Threshold node If these inputs are great enough, the unit fires. That is to say, a positive activation occurs here. How can we implement the AND function?

  17. How can we implement the AND function? First we must decide on representation: possible inputs: 1,0 possible outputs: 1,0 We want an artificial neuron to implement this function. unit inputs unit output Boolean AND: 1 1 1 0 1 0 1 0 0 0 0 0

  18. -1 net = ∑activations arriving at threshold node 1 net 1 1 unit inputs unit output 1 1 1 0 1 0 1 0 0 0 0 -1 Oooops

  19. -1 0 f(netΣ) net 0 0 STEP activation function f(x) = 1 if x > 0 f(x) = 0 if x <= 0

  20. a7 = a7 (w79) = 1(.75) = .75 1 w79 a1 .75 .75 w91 f(net9) .6216 1.25 .777 .8 a8 .5 w89 a9 1.6667 .3 = 1 / (1+e (-net)) =. 777 = a8 (w89) = .3(1.6667) = .5 net9= Σj aj (wj9) = .3(1.6667) + 1(.75) = 1.25

More Related