1 / 80

Why Consider Probabilistic Models? Computational Reasons

Probabilistic Models of Cortical Computation Rajesh P. N. Rao Dept. of Computer Sci. and Engineering & Neurobio. and Behavior Program University of Washington Seattle, WA Lab website: http://neural.cs.washington.edu November, 2004 Funding: Sloan Foundation, Packard Foundation, ONR, and NSF.

tracey
Download Presentation

Why Consider Probabilistic Models? Computational Reasons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Models of Cortical ComputationRajesh P. N. RaoDept. of Computer Sci. and Engineering &Neurobio. and Behavior ProgramUniversity of WashingtonSeattle, WALab website: http://neural.cs.washington.eduNovember, 2004Funding: Sloan Foundation, Packard Foundation, ONR, and NSF

  2. Why Consider Probabilistic Models?Computational Reasons • Sensory measurements are typically ambiguous • E.g. Projection from 3D to 2D in vision • Biological sensors and processing elements are noisy • Animal’s knowledge of the world is usually incomplete There appears to be a need to be able to represent, learn, and reason about probabilities

  3. Example 1: Ambiguity of Stimuli Retinal Image Eye Eye Is it an oval-shaped or a circular object?

  4. Bayesian Model: The Likelihood Function Likelihood = P(I | Slant, Aspect ratio) Retinal Image I (From Geisler & Kersten, 2002)

  5. Bayesian Model: The Posterior Posterior = Likelihood  Prior  k (k = normalization constant) (From Geisler & Kersten, 2002)

  6. Example 2: Noise and Incomplete Knowledge What is this image depicting?

  7. Bayesian Model Prior probability P() street dog … Okinawa beach Posterior Probability P( | I) = P(I | )P()/P(I) Likelihood P(I | ) Input Image sample ??? (Bayesian decision)

  8. Bayesian Model with “Top-Down” Bias Prior probability P() street dog … Okinawa beach Posterior Probability P( | I) = P(I | )P()/P(I) Likelihood P(I | ) Input Image sample “Dog” (Bayesian decision) dog

  9. Psychophysical Evidence for Bayesian Perception • Motion from cast shadows (Kersten et al., 1996) • Surface perception based on texture (Knill, 1998) • Inferring 3D shape from 2D images (Mamassian et al., 2002) • Color perception (Bloj et al., 1999) • Cue combination for depth perception (Jacobs, 2002) • Motion illusions (Weiss et al., 2002) • Motor Control (Körding and Wolpert, 2004)

  10. Other Results: Contextual Modulation in V1 (Zipser et al., 1996 )

  11. Attentional Modulation in V2 and V4 (Reynolds et al., 1999)

  12. Decision Neurons in Areas LIP and FEF (Roitman and Shadlen, 2002) t (ms)

  13. Can a network of neurons perform Bayesian inference? Rev. Thomas Bayes (1702-1761) • How is prior knowledge about • the world (prior probabilities • and likelihoods) stored in a • network? • How are posterior probabilities • of states computed?

  14. Generative Models for Bayesian Inference • Fundamental Idea: Inputs received by an organism are caused by external “states” of the world (hidden “causes”) • Goal: Estimate the probability of these causes (or states or “interpretations”) based on the inputs received thus far

  15. Example: Linear Generative Models

  16. Linear Generative Model • Spatial Generative Model: I(t) = Ur(t)+ n(t) • r(t) = representation vector, n = zero mean Gaussian white noise with covariance  • Temporal Dynamics for Time-Varying Processes: r(t) = Vr(t-1) + m(t-1) • V = transition matrix, m = zero mean Gaussian white noise with covariance m • Goal: Find optimal representation vector r(t) given inputs I(t), I(t-1), …, I(1).

  17. Optimization Functions • Find optimal r(t) by Minimizing Prediction Errors for all t: • = mean of r before measurement of I • Generalize to Weighted Least Squares Function: • M = covariance before measurement of I

  18. Minimizing E = Maximizing Posterior Probability • Minimizing E is equivalent to Maximizing log P(r|I) which is equivalent to Maximizing Posterior Probability P(r|I)

  19. Optimal Estimation and Kalman Filtering • Setting dE/dr = 0 and solving for the optimal r yields the Kalman Filter: • K(t) = “Kalman gain” matrix = N(t)UT-1 • N(t) = covariance of r after measurement of I(t) = (UT-1U + M(t) -1) -1 • M(t) = VN(t-1)VT+m

  20. A Simplified Kalman Filter • If  is diagonal and equal to , K(t) = (N(t)/ )UT = G(t)UT • Kalman filter equation is of the form: New Estimate = Prediction + Gain x Prediction Error • UT= Feedforward Matrix • U = Feedback Matrix • V = Recurrent Matrix (Lateral Connections)

  21. Neural Implementation via Predictive Coding (Rao & Ballard, 1997,1999; Rao, 1999) Predictive Coding Model: Feedback = Prediction Feedforward = Prediction Error

  22. Clues from Cortical Anatomy Higher Area Lower Area

  23. Hierarchical Organization of the Visual Cortex Higher Lower

  24. Hierarchical Generative Model (Rao & Ballard, 1999) • Original Generative Model: I = Ur + n • Hierarchical Generalization: r = Uhrh+ nh • rh = representation at a higher level • With Temporal Dynamics: r(t) = Vr(t-1) + Uhrh(t-1)+ m(t-1) • Can derive Kalman filter equations for each level • Yields a Hierarchical Model for Predictive Coding rh r I

  25. I I Hierarchical Predictive Coding Model (Rao & Ballard, 1997,1999) = Uh rh

  26. The Predictive Coding Hypothesis • Feedback connections from higher areas convey predictions of expected activity in lower areas • Feedforward connections convey the errors between actual and predicted responses Model Prediction Since feedforward connections to higher areas originate from layer 2+3, responses of layer 2+3 neurons should be interpretable as prediction errors

  27. Results from the Classic Studies of Hubel and Weisel (1960s)

  28. “Endstopping” in Cortical Neurons

  29. Contextual Modulation in Visual Cortex (Zipser et al., 1996 )

  30. Example Network for Predictive Coding

  31. Natural Images used for Training

  32. Synaptic Weights after Learning

  33. Endstopping as a Predictive Error Signal

  34. Comparison with Layer 2+3 Cortical Neuron

  35. Why Does Endstopping Occur in the Model?Orientation-DependentCorrelations in Natural Images

  36. Other Contextual Effects in the Model

  37. Support for Predictive Coding from an Imaging Study (Murray et al., 2002)

  38. + - - + + - Predictive Coding in the Retina From: Nicholls et al., 1992 Receptive Fields On-center off-surround Off-center on-surround Response of a retinal ganglion cell can be interpreted as the difference (error) between center pixel values and their prediction based on surrounding pixels (Srinivasan et al., 1982)

  39. Temporal Receptive Field of LGN X-cell LGN cell responses Response of LGN cell can be interpreted as the difference (error) between current pixel values and their prediction based on past pixel values From: Dan et al., 1996 Predictive Coding in the LGN

  40. Summary for Part I • Computational and experimental studies point to the need for probabilistic models of brain function • Probabilistic models typically rely on generative models of sensory (and motor) processes • We examined a simple linear generative model and its hierarchical generalization • Bayesian inference via Kalman filtering • Neural implementation allows Hierarchical Predictive Coding • Feedback connections convey predictions • Feedforward connections convey errors in prediction • Hierarchical predictive coding explains endstopping and other contextual surround effects based on natural image statistics

  41. Break Questions to Ponder over: Can we go beyond linear generative models and Gaussian distributions? Can a neural population encode an entire probability distribution rather than simply the mean or mode?

  42. Earthquake Burglar Alarm Radio Generative Models II: Graphical Models • Graphical models depict the generative process as a graph • Nodes denote random variables (states) • Edges denote dependencies • Example: If states are continuous, linear generative model: I = Ur + n r I

  43. Continuous versus Discrete States   Unimodal E.g. Normal N(;,) Multimodal  Discrete Approximation 1 i M Discrete States

  44. Earthquake Burglar Alarm Radio The Belief Propagation Algorithm • If states are discrete, probabilities of random variables can be calculated through “belief propagation” (Pearl, 1988): • Each node j sends a “message” (probability density) to every neighbor i • Message to neighbor i depends on messages received from all other neighbors

  45. t-2 t-1 t State Input Graphical Model for a HMM It-2 It-1 It An Example: Hidden Markov Models (HMMs) • A Simple but Powerful Graphical Model for Temporal Data: • Observed world can be in one of M states1, 2, …, M • The state tat time step t depends only on previous state t-1and is given by the probabilities: P(t = i | t-1 = j ) (or for convenience) • The input It at time t is given by P(It | t = j )

  46. Inference in HMMs t-2 t-1 t State Input It-2 It-1 It Likelihood of i at time t Prediction for i at time t

  47. Equivalence to Belief Propagation for HMMs t-2 t-1 t State Input It-2 It-1 It Equivalent to on-line (“forward”) belief propagation through time

  48. Can a network of neurons perform this computation?

  49. Recurrent Network Model R Synaptic weights Input I Leaky Integrator Equation for Output Firing Rate v Output Decay Input Feedback

  50. Discrete Implementation R Input I New activity Input Prior Activity

More Related