270 likes | 403 Views
Cognitive Computing 2012. The computer and the mind 4. CONNECTIONISM Professor Mark Bishop. The representational theory of mind. Cognitive states are relations to mental representations which have content .
E N D
Cognitive Computing 2012 The computer and the mind 4. CONNECTIONISM Professor Mark Bishop
The representational theory of mind • Cognitive states are relations to mental representations which have content. • A cognitive state is a state (of mind) denoting knowledge; understanding; beliefs etc. • Cognitive processes are mental operations on these representations. (c) Bishop: An introduction to Cognitive Science
Mental Representation Computations, e.g. +, -, x, / etc. “Grass is green” Computational theories of mind • Cognitive states are computational relations to mental representations which have content. • Cognitive processes (changes in cognitive states) are computational operations on the mental representations. • Strong computational theories of mind claim that the mental representations are themselves fundamentally computational in character. • Hence the mind - thoughts, beliefs, intelligence, problem solving etc. - is ‘merely’ a computational machine. • Computational theories of mind typically come in two flavours: • The connectionist computational theory of mind, (CCTM); • The digital computational theory of mind, (DCTM). (c) Bishop: An introduction to Cognitive Science
Mental Representation Computations, e.g. +, -, x, / etc. Happiness Basic connectionist computational theory of mind (CCTM) • The basic connectionist theory of mind is neutral on exactly what constitutes [connectionist]’ mental representations’ • I.e. The connectionist ‘mental representations’ might not be realised ‘computationally’. • Cognitive states are computational relations to mental representations which have content. • Under the CCTM the computational architecture and (mental)representations are connectionist. • Hence for CCTM cognitive processes (changes in cognitive states) are computational operations on these connectionist mental representations. (c) Bishop: An introduction to Cognitive Science
A ‘non-computational’ connectionist theory of mind • Conceptually it is also possible to formulate a connectionist non-computational theory of mind where: • Cognitive states are relations to mental representations which have content. • But the mental representations might not be ‘computational’ in character; perhaps they are instantiated on a non-computational connectionist architecture AND / OR • the relation between cognitive state and mental representation is non-computational; or the relationship between one cognitive state and the next is non-computational. • The term ‘non-computational’ here typically refers to a mode of [information] processing that, in principle, cannot be carried out via Turing Machine. (c) Bishop: An introduction to Cognitive Science
The connectionist computational theory of mind • A form of ‘Strong AI’ which holds that a suitably programmed computer ‘really is a mind’, (it has thoughts, beliefs, intelligence etc.): • Cognitive states are computational relations to fundamentallycomputationalmental representations which have content defined by their core computational structure. • Cognitive processes (changes in cognitive states) are computational operations on these computational mental representations. • The computational architecture and representations are computationally connectionist. (c) Bishop: An introduction to Cognitive Science
Artificial neural networks • What is Neural Computing / Connectionism? • It defines a mode of computing that seeks to include the style of computing used within the brain. • It is a style of computing based on learning from experience as opposed to classical, tightly specified, algorithmic, methods. • A Definition: • “Neural computing is the study of networks of adaptable nodes which, through a process of learning from task examples, store experiential knowledge and make it available for use.” (c) Bishop: An introduction to Cognitive Science
The link between connectionism and associationism • By considering that: • the input nodes of an artificial neural network represent data from sensory transducers (the 'sensations'); • the internal (hidden) network nodes to encode ideas; • the inter-node weights indicate strengths between ideas; • the output nodes define behaviour; • … then we see a correspondence between connectionism and associationism. (c) Bishop: An introduction to Cognitive Science
The neuron: the adaptive node of the brain • Within the brain neurons are often organized into complex regular structures. Input to neurons occurs at points called synapses located on the cell’s dendritic tree. • Synapses are either excitatory, where activity aids the overall firing of the neuron, or inhibitory where activity inhibits the firing of the neuron. • The neuron effectively takes all firing signals into account by summing the synaptic effects and firing if this is greater than a firing threshold, T. • The cell’s output is transmitted along a fibre called the axon. A neuron is said to fire when the axon transmits burst of pulses at around 100Hz. (c) Bishop: An introduction to Cognitive Science
The McCulloch/Pitts cell • In the MCP model adaptability comes from representing each synaptic junction by a variable weight Wi, indicating the degree to which the neuron should react to this particular input. • By convention positive weights represent excitatory synapses and negative inhibitory synapses. • The neuron firing threshold is represented by a variable T • In modern MCP cells T is usually clamped to zero and a threshold implemented using a variable bias, b. • A bias is simply a weight connected to an input clamped to [+1]. • In the MCP model the firing of the neuron is represented by the number 1, and no firing by 0. • Equivalent to a proposition TRUE or FALSE • “Thus in Psychology, .. , the fundamental relations are those of two valued logic”, MCP (1943). • Activity at the ith input to the neuron is represented by the symbol Xi and the effect of the ith synapse by a weight Wi. • Net input at a synapse on the MCP cell is: Xi x Wi • The MCP cell will fire if: ((Xi x Wi) + b) 0 (c) Bishop: An introduction to Cognitive Science
So, what type of tasks can neural networks do? • From McCulloch & Pitts, (1943), a network of MCP cells can , “compute only such numbers as a Turing Machine; second that each of the latter numbers can be computed by such a net”. • A neural network classifier, (above) maps an arbitrary input vector to an (arbitrary), output class. (c) Bishop: An introduction to Cognitive Science
Vector association • An associative neural network is one that maps (associates), a given input vector to a particular output vector. • Associative Networks in 'prediction'. • eg. Given input vector [age and alcohol consumed], map to the output vector, [the subjects response time]. (c) Bishop: An introduction to Cognitive Science
What is a learning rule? • To enable a neural network to either associate or classify correctly we need to correctly specify all its weights and thresholds. • In a typical network there may be many thousands of weight and threshold values. • A neural network learning rule is an procedure for automatically calculating these values. • Typically there are far too many to calculate by hand. (c) Bishop: An introduction to Cognitive Science
Hebbian learning • “When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such that A’s efficiency as one of the cells firing B, is increased,” • ... from, Hebb, D., (1949), The Organisation of Behaviour. • ie. When two neurons are simultaneously excited then the strength of the connection between them should be increased. • "The change in weight connecting input Ii and output Oj is proportional (via a learning rate tau, ) to the product of their simultaneous activations." • Wij = Ii Oj (c) Bishop: An introduction to Cognitive Science
Training sets • The function that the neural network is to learn is defined by its ‘training set’. • For example, to learn the logical OR function the training set would consist of four input-output vector pairs defined as follows. The OR Function Pat I/P1 I/P2 O/P 1. 0 0 0 2. 0 1 1 3. 1 0 1 4. 1 1 1 (c) Bishop: An introduction to Cognitive Science
Rosenblatt’s perceptron • When Rosenblatt first published information on the, ‘Perceptron Convergence Procedure’ in 1959, it was seen as a great advance on the work of Hebb. • The full (‘classical’) perceptron model can be divided into three layers (see opposite): (c) Bishop: An introduction to Cognitive Science
Perceptron structure • The First Layer (Sensory or S-Units) • The first layer, retina, comprises of a regular array of S-Units. • The Second Layer (Association or A-Units) • The input to each A-Unit is the weighted sum of the output of a randomly selected set of S-Units. These weights do not change. • Thus A-Units respond only to particular patterns, extracting specific localized features from the input. • The Third Layer (Response or R-Units) • Each R-Unit has a set of variable weighted connection to a set of A-Units. An R-Unit outputs +1 if the sum of its weighted input is greater than a threshold T, -1 otherwise. • In some perceptron models, an active R-Unit will inhibit all A-Units not in its input set. (c) Bishop: An introduction to Cognitive Science
The ‘perceptron convergence procedure’ • If the perceptron response is correct, then no change is made in the weights to R-Units. • If the response of an R-Unit is incorrect then it is necessary to: • Decrement all active weights if the R-Unit fires when it is not meant to and increase the threshold. • Or conversely increment active weights and decrement the threshold, if the R-Unit is not firing when it should. • The Perceptron Convergence Theorem (Rosenblatt) • ... states that the above procedure is guaranteed to find a set of weights to perform a specified mapping on a single layer network, if such a set of weights exist! (c) Bishop: An introduction to Cognitive Science
The ‘order’ of a perceptron • The order of a perceptron is defined as the largest number of inputs to any of its A-Units. • Perceptrons will only be useful if this 'order' remains constant as the size of the retina is increased. • Consider a simple problem - the perceptron should fire if there is one or more groups of [2*2] black pixels on the input retina. • Opp. - A [4x4] blob detecting Perceptron • This problem requires that perceptron has as many A-Units as there are pixels on the retina, less duplications due to edge effects. Each A-Unit covers a [2*2] square and computes the AND of its inputs. • If all the weights to the R-Unit are unity and the threshold is just lower than unity, then the perceptron will fire if there is a black square anywhere on the retina. • The order of the problem is thus four O(4). This is order remains constant irrespective of the size of the retina. (c) Bishop: An introduction to Cognitive Science
The delta rule: a modern formulation of the perceptron learning procedure • The modern formulation of the single layer perceptron learning rule for changing weights in a single layer network of MCP cells, following the presentation of input/output training pair, P, is: p Wij = (Tpj - Opj) Ipi = pj Ipi • is called the learning rate, (eta). • (Tpj - Opj) is the error (or delta) term, pj, for the jth neuron. • Ipi is the ith element of the input vector, Ip. (c) Bishop: An introduction to Cognitive Science
Two input MCP cell • The output function can be represented in two dimensions • Using the x-axis for one input • The y-axis for the other. • Examining the MCP equation for two inputs: • X1 W1 + X2 W2 > T • The MCP output function can be represented by a line dividing the two dimensional input space into two areas. • The above equation can be re-written as an equation representing the line dividing the input space into two classes: • X1 W1 + X2 W2 = T OR • X2 = T / W2 - X1 W1 / W2 (c) Bishop: An introduction to Cognitive Science
Linearly separable problems • The two input MCP cell can correctly classify any function that can be separated by a straight dividing line in input space. • This class of problems are defined as ‘Linearly Separable’ problems. • eg. the OR/AND functions. • The MCP threshold parameter performs a simple affine transformation on the line dividing the two classes. (c) Bishop: An introduction to Cognitive Science
Linearly inseparable problems • There are many problems that cannot be linearly divided in input space • Minsky and Papert defined these, ‘Hard Problems’. • The most famous example of this class of problem is the ‘XOR’ problem. • The two input XOR problem is not linearly separable in two dimensions • See figure opposite. (c) Bishop: An introduction to Cognitive Science
To make a problem linearly separable • To solve the two input XOR problem it needs to be made linearly separable in input space. • Hence an extra input (dimension) is required. • Consider an XOR function defined by three inputs (a,b,c), where (c = a AND b) • Thus embedding the 2 input XOR in a 3 dimensional input space. • In general a two class, k-input problem can be embedded in a higher n-dimensional hypercube (n > k). • A two class problem is linearly separable in n dimensions if there exists a hyper-plane to separate the classes. • cf. The ‘Support Vector Machine’ • Here we map from an input space, (where data are not linearly separable), to a sufficiently large feature space, where classes are linearly separable. (c) Bishop: An introduction to Cognitive Science
Hard problems • In their book ‘Perceptrons’, Minsky & Papert showed that there were several simple image processing tasks that could not be performed by Single Layer Perceptrons (SLP‘s) of fixed order. • All these problems are easy to compute using ‘conventional’ algorithmic methods. (c) Bishop: An introduction to Cognitive Science
Connectedness • A Diameter Limited Perceptron is one where the inputs to an A-Units must fall within a receptive field of size D. • Clearly only (b) and (c) are connected, hence the perceptron should fire only on (b) and (c). • The A-Units can be divided into three groups. Those on the left, the middle and the right of the image. • Clearly for images (a) & (c) it is only the left group that can tell the difference, hence there must be higher weights activated by the left A-Units in image (c) than image (a). • Clearly for images (a) & (b) it is only the right group that can tell the difference, hence there must be higher weights activated by the right A-Units on (b) than on (a). • However the above two requirements give (d) higher activation than (b) and (c), which implies that if a threshold is found that can classify (b) & (c) as connected, then it will incorrectly classify (d)! (c) Bishop: An introduction to Cognitive Science
Multi-layer Perceptrons • Solutions to Minsky & Papert’s Hard problems arose with the development of learning rules for multi-layer perceptions. • The most famous of these is called ‘Back [error] Propagation’ and was initially developed by the control engineer Paul Werbos and published in the Appendix to this PhD thesis in 1974, but was ignored form may years. • Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974 • Back propgation was independently rediscovered by Le Cun and published (in French) in 1985 . • Y. LeCun: Une procédure d'apprentissage pour réseau a seuil asymmetrique (a Learning Scheme for Asymmetric Threshold Networks), Proceedings of Cognitiva 85, 599-604, Paris, France, 1985, • However the rule gained international renown with the publication of Rumelhart & McClelland’s ‘Parallel Distributed Processing’ texts in the early 1980s and they are the authors most stronfly associated with it. • Rumelhart, D.E., J.L. McClelland and the PDP Research Group (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, Cambridge, MA: MIT Press. (c) Bishop: An introduction to Cognitive Science