Why Consider Probabilistic Models? Computational Reasons

Probabilistic Models of Cortical ComputationRajesh P. N. RaoDept. of Computer Sci. and Engineering &Neurobio. and Behavior ProgramUniversity of WashingtonSeattle, WALab website: http://neural.cs.washington.eduNovember, 2004Funding: Sloan Foundation, Packard Foundation, ONR, and NSF

Why Consider Probabilistic Models?Computational Reasons • Sensory measurements are typically ambiguous • E.g. Projection from 3D to 2D in vision • Biological sensors and processing elements are noisy • Animal’s knowledge of the world is usually incomplete There appears to be a need to be able to represent, learn, and reason about probabilities

Example 1: Ambiguity of Stimuli Retinal Image Eye Eye Is it an oval-shaped or a circular object?

Bayesian Model: The Likelihood Function Likelihood = P(I | Slant, Aspect ratio) Retinal Image I (From Geisler & Kersten, 2002)

Bayesian Model: The Posterior Posterior = Likelihood  Prior  k (k = normalization constant) (From Geisler & Kersten, 2002)

Example 2: Noise and Incomplete Knowledge What is this image depicting?

Bayesian Model Prior probability P() street dog … Okinawa beach Posterior Probability P( | I) = P(I | )P()/P(I) Likelihood P(I | ) Input Image sample ??? (Bayesian decision)

Bayesian Model with “Top-Down” Bias Prior probability P() street dog … Okinawa beach Posterior Probability P( | I) = P(I | )P()/P(I) Likelihood P(I | ) Input Image sample “Dog” (Bayesian decision) dog

Psychophysical Evidence for Bayesian Perception • Motion from cast shadows (Kersten et al., 1996) • Surface perception based on texture (Knill, 1998) • Inferring 3D shape from 2D images (Mamassian et al., 2002) • Color perception (Bloj et al., 1999) • Cue combination for depth perception (Jacobs, 2002) • Motion illusions (Weiss et al., 2002) • Motor Control (Körding and Wolpert, 2004)

Other Results: Contextual Modulation in V1 (Zipser et al., 1996 )

Attentional Modulation in V2 and V4 (Reynolds et al., 1999)

Decision Neurons in Areas LIP and FEF (Roitman and Shadlen, 2002) t (ms)

Can a network of neurons perform Bayesian inference? Rev. Thomas Bayes (1702-1761) • How is prior knowledge about • the world (prior probabilities • and likelihoods) stored in a • network? • How are posterior probabilities • of states computed?

Generative Models for Bayesian Inference • Fundamental Idea: Inputs received by an organism are caused by external “states” of the world (hidden “causes”) • Goal: Estimate the probability of these causes (or states or “interpretations”) based on the inputs received thus far

Example: Linear Generative Models

Linear Generative Model • Spatial Generative Model: I(t) = Ur(t)+ n(t) • r(t) = representation vector, n = zero mean Gaussian white noise with covariance  • Temporal Dynamics for Time-Varying Processes: r(t) = Vr(t-1) + m(t-1) • V = transition matrix, m = zero mean Gaussian white noise with covariance m • Goal: Find optimal representation vector r(t) given inputs I(t), I(t-1), …, I(1).

Optimization Functions • Find optimal r(t) by Minimizing Prediction Errors for all t: • = mean of r before measurement of I • Generalize to Weighted Least Squares Function: • M = covariance before measurement of I

Minimizing E = Maximizing Posterior Probability • Minimizing E is equivalent to Maximizing log P(r|I) which is equivalent to Maximizing Posterior Probability P(r|I)

Optimal Estimation and Kalman Filtering • Setting dE/dr = 0 and solving for the optimal r yields the Kalman Filter: • K(t) = “Kalman gain” matrix = N(t)UT-1 • N(t) = covariance of r after measurement of I(t) = (UT-1U + M(t) -1) -1 • M(t) = VN(t-1)VT+m

A Simplified Kalman Filter • If  is diagonal and equal to , K(t) = (N(t)/ )UT = G(t)UT • Kalman filter equation is of the form: New Estimate = Prediction + Gain x Prediction Error • UT= Feedforward Matrix • U = Feedback Matrix • V = Recurrent Matrix (Lateral Connections)

Neural Implementation via Predictive Coding (Rao & Ballard, 1997,1999; Rao, 1999) Predictive Coding Model: Feedback = Prediction Feedforward = Prediction Error

Clues from Cortical Anatomy Higher Area Lower Area

Hierarchical Organization of the Visual Cortex Higher Lower

Hierarchical Generative Model (Rao & Ballard, 1999) • Original Generative Model: I = Ur + n • Hierarchical Generalization: r = Uhrh+ nh • rh = representation at a higher level • With Temporal Dynamics: r(t) = Vr(t-1) + Uhrh(t-1)+ m(t-1) • Can derive Kalman filter equations for each level • Yields a Hierarchical Model for Predictive Coding rh r I

I I Hierarchical Predictive Coding Model (Rao & Ballard, 1997,1999) = Uh rh

The Predictive Coding Hypothesis • Feedback connections from higher areas convey predictions of expected activity in lower areas • Feedforward connections convey the errors between actual and predicted responses Model Prediction Since feedforward connections to higher areas originate from layer 2+3, responses of layer 2+3 neurons should be interpretable as prediction errors

Results from the Classic Studies of Hubel and Weisel (1960s)

“Endstopping” in Cortical Neurons

Contextual Modulation in Visual Cortex (Zipser et al., 1996 )

Example Network for Predictive Coding

Natural Images used for Training

Synaptic Weights after Learning

Endstopping as a Predictive Error Signal

Comparison with Layer 2+3 Cortical Neuron

Why Does Endstopping Occur in the Model?Orientation-DependentCorrelations in Natural Images

Other Contextual Effects in the Model

Support for Predictive Coding from an Imaging Study (Murray et al., 2002)

+ - - + + - Predictive Coding in the Retina From: Nicholls et al., 1992 Receptive Fields On-center off-surround Off-center on-surround Response of a retinal ganglion cell can be interpreted as the difference (error) between center pixel values and their prediction based on surrounding pixels (Srinivasan et al., 1982)

Temporal Receptive Field of LGN X-cell LGN cell responses Response of LGN cell can be interpreted as the difference (error) between current pixel values and their prediction based on past pixel values From: Dan et al., 1996 Predictive Coding in the LGN

Summary for Part I • Computational and experimental studies point to the need for probabilistic models of brain function • Probabilistic models typically rely on generative models of sensory (and motor) processes • We examined a simple linear generative model and its hierarchical generalization • Bayesian inference via Kalman filtering • Neural implementation allows Hierarchical Predictive Coding • Feedback connections convey predictions • Feedforward connections convey errors in prediction • Hierarchical predictive coding explains endstopping and other contextual surround effects based on natural image statistics

Break Questions to Ponder over: Can we go beyond linear generative models and Gaussian distributions? Can a neural population encode an entire probability distribution rather than simply the mean or mode?

Earthquake Burglar Alarm Radio Generative Models II: Graphical Models • Graphical models depict the generative process as a graph • Nodes denote random variables (states) • Edges denote dependencies • Example: If states are continuous, linear generative model: I = Ur + n r I

Continuous versus Discrete States   Unimodal E.g. Normal N(;,) Multimodal  Discrete Approximation 1 i M Discrete States

Earthquake Burglar Alarm Radio The Belief Propagation Algorithm • If states are discrete, probabilities of random variables can be calculated through “belief propagation” (Pearl, 1988): • Each node j sends a “message” (probability density) to every neighbor i • Message to neighbor i depends on messages received from all other neighbors

t-2 t-1 t State Input Graphical Model for a HMM It-2 It-1 It An Example: Hidden Markov Models (HMMs) • A Simple but Powerful Graphical Model for Temporal Data: • Observed world can be in one of M states1, 2, …, M • The state tat time step t depends only on previous state t-1and is given by the probabilities: P(t = i | t-1 = j ) (or for convenience) • The input It at time t is given by P(It | t = j )

Inference in HMMs t-2 t-1 t State Input It-2 It-1 It Likelihood of i at time t Prediction for i at time t

Equivalence to Belief Propagation for HMMs t-2 t-1 t State Input It-2 It-1 It Equivalent to on-line (“forward”) belief propagation through time

Can a network of neurons perform this computation?

Recurrent Network Model R Synaptic weights Input I Leaky Integrator Equation for Output Firing Rate v Output Decay Input Feedback

Discrete Implementation R Input I New activity Input Prior Activity

Why Consider Probabilistic Models? Computational Reasons

Why Consider Probabilistic Models? Computational Reasons

Presentation Transcript

Introduction to Computational Chemistry

An introduction to probabilistic graphical models and the Bayes Net Toolbox for Matlab

IP puzzles, probabilistic networking, and other projects at OGI@OHSU

Probabilistic Risk and Safety Analyses for Process Plants and their Areas of Application Ulrich Hauptmanns

Computational challenges in RTP treatment planning: setting the stage

Chapter 5

Off-line (and On-line) Text Analysis for Computational Lexicography

Introduction to Computational Linguistics

Probabilities and Probabilistic Models

Language Modeling

chapter13

Why Vegetarian?

Computational Modeling of Macromolecular Systems

Probabilistic Topic Models for Text Mining

UNIT 6: TIER 3 MODELING COMPUTATIONAL FLUID DYNAMICS MODELS (FDS)

Non-Seasonal Box-Jenkins Models

Language Modeling

Bayes, birds and brains: applications of inference and probabilistic modelling

A Probabilistic Model of Redundancy in Information Extraction

Reconstructing gene regulatory networks with probabilistic models

What can computational models tell us about face processing?

Probabilistic modelling in computational biology