620 likes | 758 Views
Optimality in Cognition and Grammar. Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures Cognitive architecture: Symbols & optimization in neural networks Optimization in grammar: HG OT From numerical to algebraic optimization in grammar
E N D
Optimality in Cognition and Grammar Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures • Cognitive architecture: Symbols & optimization in neural networks • Optimization in grammar: HG OTFrom numerical to algebraic optimization in grammar • OT and nativismThe initial state & neural/genomic encoding of UG • ?
The ICS Hypothesis The Integrated Connectionist/Symbolic Cognitive Architecture (ICS) • In higher cognitive domains, representations and fuctions are well approximated by symbolic computation • The Connectionist Hypothesis is correct • Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation
ƒ G kæt [σk[æt]] A σ σ σ σ k k k k æ æ t t æ æ t t The ICS Architecture
σ σ k k æ t æ t σ/rε k/r0 æ/r01 t/r11 [σ k [æ t]] Representation
⊗ ⊗ Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1) Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1) i i i, j, k∊{A, B, X, Y} i, j, k∊{A, B, X, Y} jk jk Depth 0 Depth 1 Depth 1 ⑤ ⑨ ① ⑤ ⑨ ① ① Filler vectors:A, B, X, Y Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑩ ② ② ⑥ ⑦ ⑪ ③ ⑦ ⑪ ③ ③ ⑧ ⑫ ④ ⑧ ⑫ ④ ④ Tensor Product Representations • Representations: Depth 0 ⊗
Local tree realizations • Representations:
F Aux V by G B Passive LF Patient D C Output V P A P A Input B D C Aux F by Patient E G W Agent The ICS Isomorphism Tensor product representations Tensorial networks
recipient giver give-obj John Mary book = Filler Formal Role Binding by Synchrony = s = r1 [fbook + fgive-obj]+ r3 [fMary + frecipient] + r2 [fgiver + fJohn] r1 [fbook + fgive-obj] time give(John, book, Mary)(Shastri & Ajjanagadde 1993) [Tesar & Smolensky 1994]
ƒ G kæt [σk[æt]] A σ σ σ k k k æ t æ æ t t The ICS Architecture
Two Fundamental Questions Harmony maximization is satisfaction of parallel, violable constraints 2. What are the constraints? Knowledge representation Prior question: 1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?
σ σ k k æ t æ t σ/rε k/r0 æ/r01 t/r11 [σ k [æ t]] Representation
Two Fundamental Questions Harmony maximization is satisfaction of parallel, violable constraints 2. What are the constraints? Knowledge representation Prior question: 1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?
σ k æ t *violation ‘cat’ W a[σk [æ t ]] * Constraints NOCODA: A syllable has no coda [Maori/French/English] * H(a[σk [æ t]]) = –sNOCODA < 0
ƒ G σ σ σ k k k æ t æ æ t t The ICS Architecture kæt [σk[æt]] A
ƒ G Constraint Interaction ?? σ σ σ k k k æ t æ æ t t The ICS Architecture kæt [σk[æt]] A
Constraint Interaction I • ICS Grammatical theory • Harmonic Grammar • Legendre, Miyata, Smolensky 1990 et seq.
σ H = H k æ t = H(k ,σ) > 0 H(σ, t) < 0 NOCODACoda/t ONSETOnset/k = Constraint Interaction I The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths Any formal language can be so generated.
ƒ Constraint Interaction I: HG σ σ σ k k k æ t æ æ t t The ICS Architecture G kæt [σk[æt]] A
Top-down X Y X Y X Y Bottom-up A B B A A B B A A B B A Harmonic Grammar Parser • Simple, comprehensible network • Simple grammar G • X → A B Y → B A • Language Processing: Completion
W Simple Network Parser • Fully self-connected, symmetric network • Like previously shown network … … Except with 12 units; representations and connections shown below
Harmonic Grammar Parser H(Y, —A) > 0H(Y, B—) > 0 • Weight matrix for Y → B A
Harmonic Grammar Parser • Weight matrix for X → A B
Harmonic Grammar Parser • Weight matrix for entire grammar G
X Y A B B A Bottom-up Processing
X Y A B B A Top-down Processing
Scaling up • Not yet … • Still conceptual obstacles to surmount
Explaining Productivity • Approaching full-scale parsing of formal languages by neural-network Harmony maximization • Have other networks (like PassiveNet) that provably compute recursive functions !productive competence • How to explain?
= Proof of Productivity • Productive behavior follows mathematically from combining • the combinatorial structure of the vectorial representations encoding inputs & outputs and • the combinatorial structure of the weight matrices encoding knowledge
Functions Semantics + + PSA Processes Processes Explaining Productivity I PSA & ICS Intra-level decomposition:[A B] ⇝{A, B} Inter-level decomposition:[A B] ⇝{1,0,1,…,1} ICS
PSA Processes ICS Processes Explaining Productivity II Functions Semantics ICS & PSA Intra-level decomposition:G⇝{XAB, YBA} + Inter-level decomposition:W(G )⇝{1,0,1,0;…}
ƒ G kæt [σk[æt]] A σ σ σ k k k æ t æ æ t t The ICS Architecture
ƒ G kæt [σk[æt]] A Constraint InteractionII σ σ σ k k k æ t æ æ t t The ICS Architecture
Constraint Interaction II: OT • ICS Grammatical theory • Optimality Theory • Prince & Smolensky 1991, 1993/2004
Constraint Interaction II: OT • Differential strength encoded in strict domination hierarchies (≫): • Every constraint has complete priority over all lower-ranked constraints (combined) • Approximate numerical encoding employs special (exponentially growing) weights • “Grammars can’t count”
Constraint Interaction II: OT • “Grammars can’t count” • Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man
Constraint Interaction II: OT • Differential strength encoded in strict domination hierarchies (≫) • Constraints are universal(Con) • Candidate outputs are universal (Gen) • Human grammars differ only in how these constraints are ranked • ‘factorial typology’ • First true contender for a formal theory of cross-linguistic typology • 1st innovation of OT: constraint ranking • 2nd innovation: ‘Faithfulness’
The Faithfulness/Markedness Dialectic • ‘cat’: /kat/ kæt*NOCODA— why? • FAITHFULNESSrequires pronunciation = lexical form • MARKEDNESS often opposes it • Markedness-Faithfulness dialectic diversity • English: FAITH≫ NOCODA • Polynesian: NOCODA≫ FAITH(~French) • Another markedness constraint M: • Nasal Place Agreement [‘Assimilation’] (NPA): ŋg ≻ŋb, ŋd velar nd ≻ md, ŋd coronal mb ≻nb, ŋb labial
ƒ G kæt [σk[æt]] A Constraint Interaction II: OT σ σ σ k k k æ t æ æ t t The ICS Architecture
Optimality Theory • Diversity of contributions to theoretical linguistics • Phonology & phonetics • Syntax • Semantics & pragmatics • … e.g., following lectures. Now: • Can strict domination be explained by connectionism?
Case study • Syllabification in Berber • Plan • Data, then: OT grammar Harmonic Grammar Network
Syllabification in Berber • Dell & Elmedlaoui, 1985: Imdlawn Tashlhit Berber • Syllable nucleus can be any segment • But driven by universal preference for nuclei to be highest-sonority segments
OT Grammar: BrbrOT HNUC A syllable nucleus is sonorous ONSET A syllable has an onset Strict Domination Prince & Smolensky ’93/04
Harmonic Grammar: BrbrHG • HNUC A syllable nucleus is sonorous Nucleus of sonoritys: Harmony = 2s1 s {1, 2, …, 8} ~ {t, d, f, z, n, l, i, a} • ONSET *VV Harmony = 28 • Theorem. The global Harmony maxima are the correct Berber core syllabifications [of Dell & Elmedlaoui; no sonority plateaux, as in OT analysis, here & henceforth]
ONSET HNUC BrbrNet realizes BrbrHG
BrbrNet’s Global Harmony Maximum is the correct parse • Contrasts with Goldsmith’s Dynamic Linear Models (Goldsmith & Larson ’90; Prince ’93) For a given input string, a state of BrbrNet is a global Harmony maximum if and only if it realizes the syllabification produced by the serial Dell-Elmedlaoui algorithm