Bayesian Connections: An Approach to Modeling Aspects of the Reading Process

Bayesian Connections:An Approach to Modeling Aspects of the Reading Process David A. Medler Center for the Neural Basis of Cognition Carnegie Mellon University

Bayesian Connections • The Bayesian Approach to Psychology • How do we represent the world? • Bayesian Connectionist Framework. • Bayesian Generative Networks • Learning letters. • How does context affect learning? • Empirical and Simulation Results. • Symmetric Diffusion Networks • The Ambiguity Advantage/Disadvantage. • Closing Remarks

Representing the World P() D • Problem: How do we form meaningful internal representations, P(H), given our observations of the external world, P(D)? P() H

Bayesian Theory • For a given hypothesis, H, and observed data, D, the posterior probability of H given D is computed as: where • P(H) = prior probability of the hypothesis, H • P(D) = probability of the data, D • P(D |H) = probability of D given H

Bayesian Connectionism P(H) Representation Layer P(D) Surface Layer Mediating Layer

It was 20 years ago today... An Interactive Activation Model of Context Effects in Letter Perception James L. McClelland & David E. Rumelhart (1981; 1982) • Word superiority effect • words > pseudowords > nonwords • The model accounted for the time course of perceptual identification.

Interactive Activation Model Word Level Letter Level Feature Level

20 Years Later... • Interactive Activation (IA) Model has been influential. • Many positives, but 20 years of negatives. • Internal representations are hard-coded: The Interactive Activation Model does not learn!

Bayesian Generative Networks • Initial work is an expansion of the Bayesian Generative Network framework of Lewicki & Sejnowski, 1997. • It is an unsupervised learning paradigm for multilayered architectures. • Simplified network equations, added sparse coding constraints, & included a “supervised” component.

Bayesian Generative Networks P(H) Representation Layer P(D) Surface Layer Mediating Layer

Sparse Coding Constraints • Modified the basic framework to include “sparse coding” constraints. • These are a Bayesian prior that constrain the types of representations learned. • Sparse coding encourages the network to represent any given input pattern with relatively few units.

Step 1: Learning the Alphabet • First stage of the IA model is the mapping between features and letters. • We use the Rumelhart & Siple (1974) character features.

Network Learning • 16 surface units (corresponding to 16 line segments) • 30 representation units • Trained for 50 epochs (evaluated at 1, 10, 25 & 50) • Evaluated: • Generative capability of the network • Internal representations formed

Generating the Alphabet

Interpreting Weight Structure

Network Weights Unit 1 Unit 1 Unit 2 Unit 2 Unit 3 Unit 3 Unit 4 Unit 4 Unit 5 Unit 5 Unit 6 Unit 6 Unit 7 Unit 7 Unit 8 Unit 8 Unit 9 Unit 9 Unit 10 Unit 10 Unit 11 Unit 11 Unit 12 Unit 12 Unit 13 Unit 13 Unit 14 Unit 14 Unit 15 Unit 15 Unit 16 Unit 16 Unit 17 Unit 17 Unit 18 Unit 18 Unit 19 Unit 19 Unit 20 Unit 20 Unit 21 Unit 21 Unit 22 Unit 22 Unit 23 Unit 23 Unit 24 Unit 24 Unit 25 Unit 25 Unit 26 Unit 26 Unit 27 Unit 27 Unit 28 Unit 28 Unit 29 Unit 29 Unit 30 Unit 30 Epoch: 1 10 25 50 No Sparse Coding Sparse Coding

What We Have Learned • In the unsupervised framework, the Bayesian Generative Network is able to learn the alphabet. • Representations are not necessarily the same as the IA model. • distributed (not localist) • redundant (features are coded several times) • Having learned the letters, can we now learn words?

Step 2: Learning Words • The second stage of the IA model is the mapping from letters to words. • The IA model is able to account for the “word superiority” effect using orthographic information only. • Interested in how the Bayesian framework accounts for development of the word superiority effect. • Look at participants’ learning of context.

Experimental Motivation --Z- --Z- ---P ---P -E-- -E-- --S- --S- ---R ---R -O-- -O-- KQZW READ GLUR • Our motivation for the current experiments is theword-superiority effect. • Specifically, we draw inspiration from the Reicher-Wheeler paradigm. + + + KQZW GLUR READ

The Task • The current set of studies was designed to simulate how the word superiority may develop. Specifically we were interested in: • the learning of novel, letter-like stimuli • whether stimuli were learned in parts or wholes • the effects of context on learning. • Consequently, we created an artificial environment in which we tightly controlled context.

Experimental Design: Training A B p1 p2 p3 p1 p2 p3 a b c d e f g h i j k l o1 o1 o2 o2 • Reicher-Wheeler task is based the discrimination between two characters. • Wanted a similar task in which context would interact with a character pair.

Total of 16 stimuli Detect change Testing: 288 Stimuli Experimental Design: Testing – 96 Familiar Stimuli: A B a b c g h i a e c g k l d b c j h i AAA BBB a e c g k i – 96 Crossed Stimuli: d e c j k i a b f g h l j e c g k f BAA ABB d b f j h l – 96 Novel Stimuli: a e f g k l d e f j k l a e r g n l CAA CBB

Characters were constructed from the RS features. Each character had six line segments with the following constraints: characters were continuous no two segments formed a straight line no character was a mirror image nor rotation of another. Stimuli A B p1 p2 p3 p1 p2 p3 o1 o1 o2 o2

Initial Simulations 16 P(H) 18 48 P(D) Character 1 Character 2 Character 3 • Performance was measured by computing a “differentiation value” based on the difference between the generated surface layer representation (Gi)and the target representation (Ti).

Initial Simulation Results

Simulation Conclusions • Regardless of the network architecture, all simulations showed a (slight) difference between the familiar and crossed stimuli. • No simulation performed well on the novel stimuli in comparison to the other stimuli. • These results are somewhat counter to what we expected. • Is the model broken? • How do participants perform on this task?

Stimulus Presentation 50 ms 500 ms 250 ms 200 ms 250 ms 200 ms

Data Analysis Detect Change? “Yes” “No” Differ Hit Miss Stimuli Same FA CR • Each participant’s reaction time and proportion of “hits” and “correct rejections” were recorded. • To correct for potential responder biases, the scores were converted to d’ scores using: d’ = ni(Hit) + ni(CR)

Experiment 1: One Novel • 4 Participants, 10 days each • 1440 trials per day: • 288 test trials intermixed with 1152 training trials. • Three conditions: • Familiar (AAA or BBB) • Crossed (BAA or ABB) • Novel (CAA or CBB)

d’ Scores

Do They Report a Change?

Reaction Times

Experiment Conclusions • Although there is a context effect, it is not as large as we expected, nor as stable. • There are no significant differences in reaction times for any of the conditions. • Participants do not perform well in the Novel condition • this is due to a tendency to respond “Change” to all novel stimuli

Re-Simulation of Task • The network was trained on the same data set that the participants were trained on. • Network learned on all training/testing trials • Wanted a similar measure for network performance. • Used a variant of the Kullback-Leibler divergence measure:

Simulation: Difference Measure

Simulation: Report Change?

Internal Representations Unit 1 Unit 7 Unit 13 Unit 2 Unit 8 Unit 14 Unit 3 Unit 9 Unit 15 Unit 4 Unit 16 Unit 10 Unit 11 Unit 5 Unit 17 Unit 6 Unit 18 Unit 12 • If we look at the internal representations formed by the network, we get an idea of why it behaves as it does... Training “Day”: 6 1 10

Simulation Conclusions • The Bayesian Generative Network qualitatively matched the performance of the participants. • Furthermore, analysis of the internal structure of the network offers an explanation for the participants’ behaviour. • The network failed to learn to represent novel items. • Thus, if the first generated representation is garbage, and the second generated representation is garbage, then the comparison will be garbage  “change”

Assessing Representations • The models predicted that participants in the one novel condition would fail to learn to represent the novel items. • Unfortunately, we can’t open up a person to see what their internal representation is. • We can, however, ask them. • Specifically, we can test their recognition of “novel” items following training and compare these to truly new items.

Experiment 2 • 10 Participants • Trained on the same data as Experiment 1 but were only run for 2 days. • At the conclusion of the training, participants were given a “new/old” task in which they saw the 12 old training items, the 6 old novel items, and 12 new items. • Participants saw a single character, and made the judgement “old” or “new”.

Experiment 2: Results • Participants were about 70% correct at detecting “Old” items. • Participants were no better at recognizing old “Novel” items than truly “New” items.

Learning Context • The Bayesian Generative Network is able to learn higher order information such as which characters appear in which positions. • It is able to both simulate and explain the performance of participants trained on a contextual learning task. • It is able to predict new findings! • Can we expand the model?

Symmetric Diffusion Network • Symmetric Diffusion Networks (SDN) are a class of networks that explicitly embody many of the implicit assumptions made be the Bayesian Generative Network. • SDN’s can be viewed as a more general form of the Bayesian Generative Network.

Symmetric Diffusion Network P(H) Representation Layer Mediating Layer P(D) Surface Layer

Symmetric Diffusion Network P(H) Representation Layer Mediating Layer P(D) Surface Layer Supervised Learning

Symmetric Diffusion Network P(H) Representation Layer Mediating Layer P(D) Surface Layer Unsupervised Learning

Bayesian Connections: An Approach to Modeling Aspects of the Reading Process

Bayesian Connections: An Approach to Modeling Aspects of the Reading Process

Presentation Transcript

An approach to the

Introduction to Reading: The Reading Process and Reading Strategies

Simple examples of the Bayesian approach

Chapter 10 The REA Approach to Business Process Modeling

The Modeling Process

The Learner’s Journey : an approach to the assessment and moderation of reading

Kronecker Graphs: An Approach to Modeling Networks

Bayesian Approach

MODELS OF THE READING PROCESS

Artifact-Centric Approach to Business Process Modeling

A Bayesian Approach to the Poverty of the Stimulus

Bayesian Generative Modeling

A Discussion of the Bayesian Approach

A Bayesian Approach to the Reading Process: From Networks to Human Data

An Epidemiological Approach to Diagnostic Process

Bayesian Generative Modeling

Generative (Bayesian) modeling

The Reading Process

Frequentist approach Bayesian approach