430 likes | 564 Views
Explanation and Simulation in Cognitive Science. Simulation and computational modeling Symbolic models Connectionist models Comparing symbolism and connectionism Hybrid architectures Cognitive architectures. Simulation and Computational Modeling.
E N D
Explanation and Simulation in Cognitive Science • Simulation and computational modeling • Symbolic models • Connectionist models • Comparing symbolism and connectionism • Hybrid architectures • Cognitive architectures
Simulation and Computational Modeling • With detailed and explicit cognitive theories, we can implement the theory as a computational model • And then execute the model to: • Simulate cognitive capacity • Derive predictions from the theory • The predictions can then be compared to empirical data
Questions • What kinds of theories are amenable to simulation? • What techniques work for simulation? • Is simulating the mind different from simulating the weather?
The Mind & the Weather • The mind may just be a complex dynamic system, but it isn’t amenable to generic simulation techniques: • The relation between theory and implementation is indirect: theories tend to be rather abstract • The relation between simulation results and empirical data is indirect: simulations tend to be incomplete • The need to simulate helps make theories more concrete • But “improvement” of the simulation must be theory-drive, not just an attempt to capture the data
Symbolic Models • High-level functions (e.g., problem solving, reasoning, language) appear to involve explicit symbol manipulation • Example: Chess and shopping seem to involve representation of aspects of the world and systematic manipulation of those representations
Central Assumptions • Mental representations exist • Representations are structured • Representations are semantically interpretable
What’s in a representation? • Representation must consist of symbols • Symbols must have parts • Parts must have independent meanings • Those meanings must contribute to the meanings of the symbols which contain them • e.g., “34” contains “3” and “4”, parts which have independent meanings • the meaning of “34” is a function of the meaning of “3” in the tens position and “4” in the units position
In favor of structured mental representations • Productivity • It is through structuring that thought is productive (finite number of elements, infinite number of possible combinations) • Systematicity • If you think “John loves Mary”, you can think “Mary loves John” • Compositionality • The meaning of “John loves Mary is a function of its parts, and their modes of combination • Rationality • If you know A and B is true, then you can infer A is true Fodor & Pylyshyn (1988)
What do you do with them? • Suppose we accept that there are symbolic representations • How can they be manipulated? …by a computing machine • Any such approach has three components • A representational system • A processing strategy • A set of predefined machine operations
Automata Theory • Identifies a family of increasingly powerful computing machines • Finite state automata • Push down automata • Turning machines
Automata, in brief(Figure 2.2 in Green et al., Chapter 2) • This FSA takes as input a sequence of on and off messages, and accepts any sequence ending with an “on” • A PDA adds a stack: an infinite-capacity, limited access memory, so that what a machine does depends on input, current state, plus the memory
A Turing machine changes this memory to allow any location to be accessed at any time. An the State transition function specifies read/write instructions, as well as which state to move to next. • Any effective procedure can be implemented on an appropriately programmed Turing machine • And Universal Turing machines can emulate any Turing machine, via a description on the tape of the machine and its inputs • Hence, philosophical disputes: • Is the brain Turing powerful? • Does machine design matter or not?
More practical architectures • Von Neumann machines: • Strictly less powerful than Turing machines (finite memory) • Distinguished area of memory for stored programs • Makes them conceptually easier to use than TMs • Special memory location points to next-instruction on each processing cycle: fetch instruction, move pointer to next instruction, execute current instruction
Production Systems • Introduced by Newell & Simon (1972) • Cyclic processor with two main memory structures • Long term memory with rules (~productions) • Working memory with symbolic representation of current system state • Example: IF goal (sweeten(X) AND available (sugar) THEN action (add(sugar, X)) and retract (goal(sweeten(X)))
Recognize phase (pattern matching) • Find all rules in LTM that match elements in WM • Act phase (conflict resolution) • Choose one matching rule, execute, update WM and (possibly) perform action • Complex sequences of behavior can thus result • Power of pattern matcher can be varied, allowing different use of WM • Power of conflict resolution will influence behavior given multiple matches • Most specific? • This works well for problem-solving. Would it work for pole-balancing?
Connectionist Models • The basic assumption • There are many processors connected together, and operating simultaneously • Processors: units, nodes, artificial neurons
A connectionist network is… • A set of nodes, connected in some fashion • Nodes have varying activation levels • Nodes interact via the flow of activation along the connections • Connections are usually directed (one-way flow), and weighted (strength and nature of interaction; positive weight = excitatory; negative = inhibitory) • A node’s activation will be computed from the weighted sum of its inputs
Local vs. Distributed Representation • Parallel Distributed Processing is a (the?) major branch of connectionism • In principle, a connectionist node could have an interpretable meaning • E.g., active when ‘red’ input, or ‘grandmother’, or whatever • However, an individual PDP node will not have such an interpretable meaning • Activation over whole set of nodes corresponds to ‘red’ • Individual node participates in many such representations
PDP • PDP systems lack systematicity and compositionality • Three main types of networks: • Associative • Feed-forward • Recurrent
Associative • To recognize and reconstruct patterns • Present activation pattern to subset of units • Let network ‘settle’ in stable activation pattern (reconstruction of previously learned state)
Feedforward • Not for reconstruction, but for mapping from one domain to another • Nodes are organized into layers • Activation spreads through layers in sequence • A given layer can be thought of as an “activation vector” • Simplest case: • Input layer (stimulus) • Output layer (response) • Two layer networks are very restricted in power. Intermediate (hidden) layers gain most of the additional computational power needed.
Recurrent • Feedforward nets compute mappings given current input only. Recurrent networks allow mapping to take into account previous input. • Jordan (1986) and Elman (1990) introduced networks with: • Feedback links from output or hidden layers to context units, and • Feedforward links from the context units to the hidden units • Jordan network output depends on current input and previous output • Elman network output depends on current input and whole of previous input history
Key Points about PDP • It’s not just that a net can recognize a pattern or perform a mapping • It’s the fact that it can learn to do so, on the basis of limited data • And the way that networks respond to damage is crucial
Learning • Present network with series of training patterns • Adjust the weights on connections so that the patterns are encoded in the weights • Most training algorithms perform small adjustments to the weights per trial, but require many presentations of the training set to reach a reasonable degree of performance • There are many different learning algorithms
Learning (contd.) • Associative nets support Hebbian learning rule: • Adjust weight of connection by amount proportional to the correlation in activity of corresponding nodes • So if both active, increase weight; if both inactive, increase weight; if they differ, decrease weight • Important because this is biologically plausible…and very effective
Learning (contd.) • Feedforward and recurrent nets often exploit the backpropagation of error rule • Actual output compared to expected output • Difference computed and propagated back to input, layer by layer, requiring weight adjustments • Note: unlike Hebb, this is supervised learning
Psychological Relevance • Given a network of fixed size, if there are two few units to encode the training set, then interference occurs • This is suboptimal, but is better than nothing, since at least approximate answers are provided • And this is the flipside of generalization, which provides output for unseen input • E.g., weep wept; bid bid
Damage • Either remove a proportion of connections • Or introduce random noise into activation propagation • And behavior can simulate that of people with various forms of neurological damage • “Graceful degradation”: impairment, but residual function
Example of Damage • Hinton & Shallice (1991), Plaut & Shallice (1993) on deep dyslexia: • Visual error (‘cat’ read as ‘cot’) • Semantic error (‘cat’ read as ‘dog’) • Networks constructed for orthography-to-phonology mapping, lesioned in various ways, producing behavior similar to human subjects
Symbolic Networks • Though distributed representations have proved very important, some researchers prefer localist approaches • Semantic networks: • Frequently used in AI-based approaches, and in cognitive approaches which focus on conceptual knowledge • One node per concept; typed links between concepts • Inference: link-following
Production systems with spreading activation • Anderson’s work (ACT, ACT*, ACT-R) • Symbolic networks with continuous activation values • ACT-R never removes working memory elements; activation instead decays over time • Productions chosen on basis of (co-) activation
Interactive Activation Networks • Essentially, localist connectionist networks • Featuring self-excitatory and lateral inhibitory links, which ensure that there’s always a winner in a competition (e.g., McClelland & Rumelhart’s model of letter perception) • Appropriate combinations of levels, with feedback loops in them, allow modeling of complex data-driven and expectation-driven bahavior
Comparing Symbolism & Connectionism • As is so often the case in science, the two approaches were initially presented as exclusive alternatives
Connectionist: • Interference • Generalization • Graceful degradation • Symbolists complain: • Connectionists don’t capture structured information • Network computation is opaque • Networks are “merely” implementation-level
Symbolic • Productive • Systematic • Compositional • Connectionists complain: • Symbolists don’t relate assumed structures to brain • They relate them to von Neumann machines
Connectionists can claim: • Complex rule-oriented behavior *emerges* from interaction of subsymbolic behavior • So symbolic models describe, but do not explain
Symbolists can claim: • Though PDP models can learn implicit rules, the learning mechanisms are usually not neurally plausible after all • Performance is highly dependent on exact choice of architecture
Hybrid Architectures • But really, the truth is that different tasks demand different technologies • Hybrid approaches explicitly assume: • Neither connectionist nor symbolic approach is flawed • Their techniques are compatible
Two main hybrid options: • Physically hybrid models: • Contain subsystems of both types • Issues: interfacing, modularity (e.g., use Interactive Activation Network to integrate results) • Non-physically hybrid models • Subsystems of only one type, but described two ways • Issue: levels of description (e.g., connectionist production systems)
Cognitive Architectures • Most modeling is aimed at specific processes or tasks • But it has been argued that: • Most real tasks involve many cognitive processes • Most cognitive processes are used in many tasks • Hence, we need unified theories of cognition
Examples • ACT-R (Anderson) • Soar (Newell) Both based on production system technology • Task-specific knowledge coded into the productions • Single processing mechanism, single learning mechanism
Like computer architectures, cognitive architectures tend to make some tasks easy, at the price of making other hard • Unlike computer architectures, cognitive architectures must include learning mechanisms • But note that the unified approaches sacrifice genuine task-appropriateness and perhaps also biological plausibility
A Cognitive Architecture is: • A fixed arrangement of particular functional components • A processing strategy