300 likes | 422 Views
Attendee questionnaire. Name Affiliation/status Area of study/research For each of these subjects: Linguistics (Optimality Theory) Computation (connectionism/neural networks) Philosophy (symbolic/connectionist debate) Psychology (infant phonology) please indicate your relative level of
E N D
Attendee questionnaire • Name • Affiliation/status • Area of study/research • For each of these subjects: • Linguistics (Optimality Theory) • Computation (connectionism/neural networks) • Philosophy (symbolic/connectionist debate) • Psychology (infant phonology) please indicate your relative level of • interest (for these lectures) [1 = least, 5 = most] • background [1 = none, 5 = expert] Thank you
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures • Cognitive architecture • Symbols and neurons • Symbols in neural networks • Optimization in neural networks • Optimization in grammar I: HG OT • Optimization in grammar II: OT • OT in neural networks
Cognitive architecture • Central dogma of cognitive science: Cognition is computation • But what type of computation? • What exactly is computation, and what work must it do in cognitive science?
Computation • Functions, cognitive • Pixels objects locations [low- to high-level vision] • Sound stream word string [phonetics + …] • Word string parse tree [syntax] • Underlying form surface form [phonology] • petit copain: /pətit + kopɛ̃/ [pə.ti.ko.pɛ̃] • petit ami: /pətit + ami/ [pə.ti.ta.mi] • Reduction of complex procedures for evaluating functions to combinations of primitive operations • Computational architecture: • Operations: primitives + combinators • Data
Symbolic Computation • Computational architecture: • Operations: primitives + combinators • Data • The Pure Symbolic Architecture (PSA) • Data: strings, (binary) trees, graphs, … • Operations • Primitives • Concatenate (string, tree) = cons • First-member(string); left-subtree(tree) = ex0 • Combinators • Composition: f(x)=def g(h(x))) • IF(x = A) THEN … ELSE …
Aux V by Passive LF V A P P A ƒPassive Few leaders are admired by George admire(George, few leaders) ƒ(s) = cons(ex1(ex0(ex1(s))),cons(ex1(ex1(ex1(s))), ex0(s))) • But for cognition, need a reduction to a very different computational architecture
The cognitive architecture: The connectionist hypothesis • Representations: Distributed activation patterns • Primitive operations (e.g.) • Multiplication of activations by synaptic weights • Summation of weighted activation values • Non-linear transfer functions At the lowest computational level of the mind/brain PDP Computation • Combination: Massive parallelism
Criticism of PDP (e.g., neuroscientists) • Much too simple • Misguided. Relevant complaint: • Much too complex • Target of computational reduction must be within the scope of neural computation. • Confusion between two questions
The cognitive questionfor neuroscience What is the function of each component of the nervous system? Our question is quite different.
The neural question for cognitive science How are complex cognitive functions computed by a mass of numerical processors like neurons—each very simple, slow, and imprecise relative to the components that have traditionally been used to construct powerful, general-purpose computational systems? How does the structure arise that enables such a medium to achieve cognitive computation?
The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) • In higher cognitive domains, representations and fuctions are well approximated by symbolic computation • The Connectionist Hypothesis is correct • Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation
F G E B Agent Patient D C Output Input B D C Aux F by Patient E G W Agent PassiveNet
F Aux V by G B Passive LF Patient D C Output V P A P A Input B D C Aux F by Patient E G W Agent The ICS Isomorphism Tensor product representations Tensorial networks
Within-level compositionality W = Wcons0[Wex1Wex0Wex1] +Wcons1[Wcons0(Wex1Wex1Wex1)+Wcons1(Wex0)] ƒ(s) = cons(ex1(ex0(ex1(s))),cons(ex1(ex1(ex1(s))),ex0(s))) Between-level reduction
ƒ G “dogs” σ σ σ k k k æ t æ æ t t Processing (Learning) The ICS Architecture dog+s dgz A
Processing I: Activation • Computational neuroscience • Key sources • Hopfield 1982, 1984 • Cohen and Grossberg 1983 • Hinton and Sejnowski 1983, 1986 • Smolensky 1983, 1986 • Geman and Geman 1984 • Golden 1986, 1988
–λ (–0.9) a1 a2 i1 (0.6) i2 (0.5) Processing I: Activation Processing — spreading activation — is optimization: Harmony maximization
ƒ G σ σ σ k k k æ t æ æ t t The ICS Architecture cat kæt A
–λ (–0.9) a1 a2 i1 (0.6) i2 (0.5) Processing II: Optimization • Cognitive psychology • Key sources: • Hinton & Anderson 1981 • Rumelhart, McClelland, & the PDP Group 1986 Processing — spreading activation — is optimization: Harmony maximization
a1 and a2must not be simultaneously active (strength: λ) Harmony maximization is satisfaction of parallel, violable well-formedness constraints –λ (–0.9) a1 a2 a1must be active (strength: 0.6) a2must be active (strength: 0.5) CONFLICT i1 (0.6) i2 (0.5) Optimal compromise: 0.79 –0.21 Processing II: Optimization Processing — spreading activation — is optimization: Harmony maximization
Processing II: Optimization • The search for an optimal state can employ randomness • Equations for units’ activation values have random terms • pr(a) ∝eH(a)/T • T (‘temperature’) ~ randomness 0 during search • Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)
ƒ G σ σ σ k k k æ t æ æ t t The ICS Architecture cat kæt A
Two Fundamental Questions Harmony maximization is satisfaction of parallel, violable constraints 2. What are the constraints? Knowledge representation Prior question: 1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?
Representation • Symbolic theory • Complex symbol structures • Generative linguistics (Chomsky & Halle ’68 …) • Particular linguistic representations • Markedness Theory (Jakobson, Trubetzkoy, ’30s …) • Good (well-formed) linguistic representations • Connectionism(PDP) • Distributed activation patterns • ICS • realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.) • will employ ‘local representations’ as well
σ σ k k æ t æ t σ/rε k/r0 æ/r01 t/r11 [σ k [æ t]] Representation
i i, j, k∊{A, B, X, Y} jk Depth 0 Depth 1 ⑤ ⑨ ① Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑦ ⑪ ③ ⑧ ⑫ ④ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Tensor Product Representations • Representations:
i i i, j, k∊{A, B, X, Y} i, j, k∊{A, B, X, Y} • Representations: jk jk Depth 0 Depth 1 Depth 0 Depth 1 ⑤ ⑨ ① ⑤ ⑨ ① ① Filler vectors:A, B, X, Y Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑩ ② ② ⑥ ⑦ ⑪ ③ ⑦ ⑪ ③ ③ ⑧ ⑫ ④ ⑧ ⑫ ④ ④ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Role vectors:rε = 1r0 = (1 1) r1 = (1 –1) Tensor Product Representations ⊗
i i i, j, k∊{A, B, X, Y} i, j, k∊{A, B, X, Y} • Representations: jk jk Depth 0 Depth 1 Depth 0 Depth 1 ⑤ ⑨ ① ⑤ ⑨ ① ① Filler vectors:A, B, X, Y Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑩ ② ② ⑥ ⑦ ⑪ ③ ⑦ ⑪ ③ ③ ⑧ ⑫ ④ ⑧ ⑫ ④ ④ ⊗ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Tensor Product Representations
Local tree realizations • Representations: