870 likes | 1.07k Views
Probabilistic Models of Cortical Computation Rajesh P. N. Rao Dept. of Computer Sci. and Engineering & Neurobio. and Behavior Program University of Washington Seattle, WA Lab website: http://neural.cs.washington.edu November, 2004 Funding: Sloan Foundation, Packard Foundation, ONR, and NSF.
E N D
Probabilistic Models of Cortical ComputationRajesh P. N. RaoDept. of Computer Sci. and Engineering &Neurobio. and Behavior ProgramUniversity of WashingtonSeattle, WALab website: http://neural.cs.washington.eduNovember, 2004Funding: Sloan Foundation, Packard Foundation, ONR, and NSF
Why Consider Probabilistic Models?Computational Reasons • Sensory measurements are typically ambiguous • E.g. Projection from 3D to 2D in vision • Biological sensors and processing elements are noisy • Animal’s knowledge of the world is usually incomplete There appears to be a need to be able to represent, learn, and reason about probabilities
Example 1: Ambiguity of Stimuli Retinal Image Eye Eye Is it an oval-shaped or a circular object?
Bayesian Model: The Likelihood Function Likelihood = P(I | Slant, Aspect ratio) Retinal Image I (From Geisler & Kersten, 2002)
Bayesian Model: The Posterior Posterior = Likelihood Prior k (k = normalization constant) (From Geisler & Kersten, 2002)
Example 2: Noise and Incomplete Knowledge What is this image depicting?
Bayesian Model Prior probability P() street dog … Okinawa beach Posterior Probability P( | I) = P(I | )P()/P(I) Likelihood P(I | ) Input Image sample ??? (Bayesian decision)
Bayesian Model with “Top-Down” Bias Prior probability P() street dog … Okinawa beach Posterior Probability P( | I) = P(I | )P()/P(I) Likelihood P(I | ) Input Image sample “Dog” (Bayesian decision) dog
Psychophysical Evidence for Bayesian Perception • Motion from cast shadows (Kersten et al., 1996) • Surface perception based on texture (Knill, 1998) • Inferring 3D shape from 2D images (Mamassian et al., 2002) • Color perception (Bloj et al., 1999) • Cue combination for depth perception (Jacobs, 2002) • Motion illusions (Weiss et al., 2002) • Motor Control (Körding and Wolpert, 2004)
Other Results: Contextual Modulation in V1 (Zipser et al., 1996 )
Attentional Modulation in V2 and V4 (Reynolds et al., 1999)
Decision Neurons in Areas LIP and FEF (Roitman and Shadlen, 2002) t (ms)
Can a network of neurons perform Bayesian inference? Rev. Thomas Bayes (1702-1761) • How is prior knowledge about • the world (prior probabilities • and likelihoods) stored in a • network? • How are posterior probabilities • of states computed?
Generative Models for Bayesian Inference • Fundamental Idea: Inputs received by an organism are caused by external “states” of the world (hidden “causes”) • Goal: Estimate the probability of these causes (or states or “interpretations”) based on the inputs received thus far
Linear Generative Model • Spatial Generative Model: I(t) = Ur(t)+ n(t) • r(t) = representation vector, n = zero mean Gaussian white noise with covariance • Temporal Dynamics for Time-Varying Processes: r(t) = Vr(t-1) + m(t-1) • V = transition matrix, m = zero mean Gaussian white noise with covariance m • Goal: Find optimal representation vector r(t) given inputs I(t), I(t-1), …, I(1).
Optimization Functions • Find optimal r(t) by Minimizing Prediction Errors for all t: • = mean of r before measurement of I • Generalize to Weighted Least Squares Function: • M = covariance before measurement of I
Minimizing E = Maximizing Posterior Probability • Minimizing E is equivalent to Maximizing log P(r|I) which is equivalent to Maximizing Posterior Probability P(r|I)
Optimal Estimation and Kalman Filtering • Setting dE/dr = 0 and solving for the optimal r yields the Kalman Filter: • K(t) = “Kalman gain” matrix = N(t)UT-1 • N(t) = covariance of r after measurement of I(t) = (UT-1U + M(t) -1) -1 • M(t) = VN(t-1)VT+m
A Simplified Kalman Filter • If is diagonal and equal to , K(t) = (N(t)/ )UT = G(t)UT • Kalman filter equation is of the form: New Estimate = Prediction + Gain x Prediction Error • UT= Feedforward Matrix • U = Feedback Matrix • V = Recurrent Matrix (Lateral Connections)
Neural Implementation via Predictive Coding (Rao & Ballard, 1997,1999; Rao, 1999) Predictive Coding Model: Feedback = Prediction Feedforward = Prediction Error
Clues from Cortical Anatomy Higher Area Lower Area
Hierarchical Organization of the Visual Cortex Higher Lower
Hierarchical Generative Model (Rao & Ballard, 1999) • Original Generative Model: I = Ur + n • Hierarchical Generalization: r = Uhrh+ nh • rh = representation at a higher level • With Temporal Dynamics: r(t) = Vr(t-1) + Uhrh(t-1)+ m(t-1) • Can derive Kalman filter equations for each level • Yields a Hierarchical Model for Predictive Coding rh r I
I I Hierarchical Predictive Coding Model (Rao & Ballard, 1997,1999) = Uh rh
The Predictive Coding Hypothesis • Feedback connections from higher areas convey predictions of expected activity in lower areas • Feedforward connections convey the errors between actual and predicted responses Model Prediction Since feedforward connections to higher areas originate from layer 2+3, responses of layer 2+3 neurons should be interpretable as prediction errors
Results from the Classic Studies of Hubel and Weisel (1960s)
Contextual Modulation in Visual Cortex (Zipser et al., 1996 )
Why Does Endstopping Occur in the Model?Orientation-DependentCorrelations in Natural Images
Support for Predictive Coding from an Imaging Study (Murray et al., 2002)
+ - - + + - Predictive Coding in the Retina From: Nicholls et al., 1992 Receptive Fields On-center off-surround Off-center on-surround Response of a retinal ganglion cell can be interpreted as the difference (error) between center pixel values and their prediction based on surrounding pixels (Srinivasan et al., 1982)
Temporal Receptive Field of LGN X-cell LGN cell responses Response of LGN cell can be interpreted as the difference (error) between current pixel values and their prediction based on past pixel values From: Dan et al., 1996 Predictive Coding in the LGN
Summary for Part I • Computational and experimental studies point to the need for probabilistic models of brain function • Probabilistic models typically rely on generative models of sensory (and motor) processes • We examined a simple linear generative model and its hierarchical generalization • Bayesian inference via Kalman filtering • Neural implementation allows Hierarchical Predictive Coding • Feedback connections convey predictions • Feedforward connections convey errors in prediction • Hierarchical predictive coding explains endstopping and other contextual surround effects based on natural image statistics
Break Questions to Ponder over: Can we go beyond linear generative models and Gaussian distributions? Can a neural population encode an entire probability distribution rather than simply the mean or mode?
Earthquake Burglar Alarm Radio Generative Models II: Graphical Models • Graphical models depict the generative process as a graph • Nodes denote random variables (states) • Edges denote dependencies • Example: If states are continuous, linear generative model: I = Ur + n r I
Continuous versus Discrete States Unimodal E.g. Normal N(;,) Multimodal Discrete Approximation 1 i M Discrete States
Earthquake Burglar Alarm Radio The Belief Propagation Algorithm • If states are discrete, probabilities of random variables can be calculated through “belief propagation” (Pearl, 1988): • Each node j sends a “message” (probability density) to every neighbor i • Message to neighbor i depends on messages received from all other neighbors
t-2 t-1 t State Input Graphical Model for a HMM It-2 It-1 It An Example: Hidden Markov Models (HMMs) • A Simple but Powerful Graphical Model for Temporal Data: • Observed world can be in one of M states1, 2, …, M • The state tat time step t depends only on previous state t-1and is given by the probabilities: P(t = i | t-1 = j ) (or for convenience) • The input It at time t is given by P(It | t = j )
Inference in HMMs t-2 t-1 t State Input It-2 It-1 It Likelihood of i at time t Prediction for i at time t
Equivalence to Belief Propagation for HMMs t-2 t-1 t State Input It-2 It-1 It Equivalent to on-line (“forward”) belief propagation through time
Recurrent Network Model R Synaptic weights Input I Leaky Integrator Equation for Output Firing Rate v Output Decay Input Feedback
Discrete Implementation R Input I New activity Input Prior Activity