Symbolic vs Subsymbolic, Connectionism (an Introduction)

Symbolic vs Subsymbolic, Connectionism (an Introduction) H. Bowman (CCNCS, Kent)

Overview • Follow up to first symbolic – subsymbolic talk • Motivation, • clarify why (typically) connectionist networks are not compositional • introduce connectionism, • link to biology • activation dynamics • learning algorithms

Recap

/p/.1 /b/.1 /p/.2 /b/.2 /p/.3 /b/.3 /p/.4 /b/.4 /u/.1 /u/.2 /u/.3 /u/.4 A.1 B.1 Z.1 A.2 B.2 A.3 B.3 A.4 B.4 Z.2 Z.3 Z.4 SLOT 1 A (Rather Naïve) Reading Model PHONOLOGY ORTHOGRAPHY

Compositionality • Plug constituents in according to rules • Structure of expressions indicates how they should be interpreted • Semantic Compositionality, “the semantic content of a (molecular) representation is a function of the semantic contents of its syntactic parts, together with its constituent structure” [Fodor & Pylyshyn,88] • Symbolists argue compositionality is a defining characteristic of cognition

Semantic Compositionality in Symbol Systems • Meanings of items plugged in as defined by syntax M[ X ] denotes meaning of X M[ John loves Jane ] = …………. M[ loves ] ..……….. M[ John ] M[ Jane ]

Semantic Compositionality Continued • Meanings of atoms constant across different compositions M[ Jane loves John ] = …………. M[ loves ] ..……….. M[ Jane ] M[ John ]

The Sub-symbolic Tradition

Rate Coding Hypothesis • Biological neurons fire spikes (pulses of current) • In artificial neural networks, • nodes reflect populations of biological neurons acting together, i.e. cell assemblies; • activation reflects rate of spiking of underlying biological neurons.

integrate (weighted sum) sigmoidal w1j w2j wnj x1 x2 xn inputs Activation in Classic Artificial Neural Network Model Positive weights: Excitation Negative weights: Inhibition output - yj activation value - yj node j net input - hj

Sigmoidal Activation Function Saturation: unresponsive at high net inputs Threshold: unresponsive at low net inputs Responsive around net input of 0

Characteristics • Nodes homogeneous and essentially dumb • Input weights characterize what a node represents / detects • Sophisticated (intelligent?) behaviour emerges from interaction amongst nodes

Learning • directed weight adjustment • two basic approaches, • Hebbian learning, • unsupervised • extracting regularities from environment • error-driven learning, • supervised • learn an input to output mapping

Use term PDP (Parallel Distributed Processing) Example: Simple Feedforward Network • weights initially set randomly • trained according to set of input to output patterns • error-driven, • for each input, adjust weights according to extent to which in error Output Hidden Input

Error-driven Learning • can learn any (computable) input-output mapping (modulo local minima) • delta rule and back-propagation • network learning completely determined by patterns presented to it

Example Connectionist Model • “Jane Loves John” difficult to represent in PDP models • Word reading as an example • orthography to phonology • Words of four letters or less • Need to represent order of letters, otherwise, e.g. slot and lots the same • Slot coding

/p/.1 /b/.1 /p/.2 /b/.2 /p/.3 /b/.3 /p/.4 /b/.4 /u/.1 /u/.2 /u/.3 /u/.4 A.1 B.1 Z.1 A.2 B.2 A.3 B.3 A.4 B.4 Z.2 Z.3 Z.4 SLOT 1 A (Rather Naïve) Reading Model PHONOLOGY ORTHOGRAPHY

pronunciation of a as an example • Illustration 1: assume a “realistic” pattern set, • a pronounced differently, • in different positions • with different surrounding letters (context), e.g. mint - pint both built into patterns • frequency asymmetries, • how often a appears at different positions throughout language reflects how effectively pronounced at different positions • strange prediction: if child only seen a in positions 1 to 3, reach state in which (broadly) can pronounce a in positions 1 to 3, but not at all in position 4; that is, cannot even guess at pronunciation, i.e. get random garbage! • labelling externally imposed: no requirement that the label a interpreted the same in different slots • in symbol systems, every occurrence of a interpreted identically

contextual influences can be beneficial, for example, • reflecting irregularities, e.g. mint – pint • pronouncing non-words, e.g. wug • Nonetheless, highly non-compositional: no sense to which plug in constituent representations • can only recognise (and pronounce) a in specific contexts, but not at all in others. • surely, sense to which, learn individual (substitutable) grapheme – phoneme mappings and then plug them in (modulo contextual influences).

Illustration 2: assume artificial pattern set in which a mapped in each position to same representation. • (assuming enough training) in sense, a in all positions similarly represented • but, • not actually identical, • random initial weight settings imply different (although similar) hidden layer representations • perhaps glossed over by thresholding at output • still strange learning prediction: reach states in which can recognise a in some positions, but not at all in others • also, amount of training needed in each position is exorbitant • fact that can pronounce a in position i does not help to learn a in position j; start from scratch in each position, each of which is different and separately learned

Connectionism & Compositionality • Principle: • with PDP nets, contextual influence inherent, compositionality the exception • with symbol systems, compositionality inherent, contextual influence the exception • in some respects neural nets generalise well, but in other respects generalise badly. • appropriate: global regularities across patterns extracted (similar patterns treated similarly) • inappropriate: with slot coding, component representations not reused

Connectionism & Compositionality • alternative connectionist models may do better, but not clear that any is truly systematic in sense of symbolic processing • alternative approaches, • localist models, e.g. Interactive Activation or Activation Gradient models • O’Reilly’s spatial invariance model of word reading? • Elman nets – recurrence for learning sequences.

References • Anderson, J. R. (1993). Rules of the Mind. Hillsdale, NJ: Erlbaum. • Bowers, J. S. (2002). Challenging the widespread assumption that connectionism and distributed representations go hand-in-hand. Cognitive Psychology., 45, 413-445. • Evans, J. S. B. T. (2003). In Two Minds: Dual Process Accounts of Reasoning. Trends in Cognitive Sciences, 7(10), 454-459. • Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and Cognitive Architecture: A Critical Analysis. Cognition, 28, 3-71. • Hinton, G. E. (1990). Special Issue of Journal Artificial Intelligence on Connectionist Symbol Processing (edited by Hinton, G.E.). Artificial Intelligence, 46(1-4). • O'Reilly, R. C., & Munakata, Y. (2000). Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain.: MIT Press. • McClelland, J. L. (1992). Can Connectionist Models Discover the Structure of Natural Language? In R. Morelli, W. Miller Brown, D. Anselmi, K. Haberlandt & D. Lloyd (Eds.), Minds, Brains and Computers: Perspectives in Cognitive Science and Artificial Intelligence (pp. 168-189). Norwood, NJ.: Ablex Publishing Company. • McClelland, J. L. (1995). A Connectionist Perspective on Knowledge and Development. In J. J. Simon & G. S. Halford (Eds.), Developing Cognitive Competence: New Approaches to Process Modelling (pp. 157-204). Mahwah, NJ: Lawrence Erlbaum. • Page, M. P. A. (2000). Connectionist Modelling in Psychology: A Localist Manifesto. Behavioral and Brain Sciences, 23, 443-512. • Pinker, S., Ullman, M. T., McClelland, J. L., & Patterson, K. (2002). The Past-Tense Debate (Series of Opinion Articles). Trends Cogn Sci, 6(11), 456-474.

Symbolic vs Subsymbolic, Connectionism (an Introduction)