470 likes | 714 Views
Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University. Neural Coding Through The Ages. February 1, 2002 Albert E. Parker. Outline. Introduction The Problem Approaches in Neural Encoding
E N D
Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University Neural Coding Through The Ages February 1, 2002 Albert E. Parker
Outline • Introduction • The Problem • Approaches in Neural Encoding • Spike Count Coding (Adrian and Zotterman 1926) • Poisson Model (Fatt and Katz 1952) • Wiener/Volterra series (1930, 1958) • Approaches in Neural Decoding • Linear Methods • Linear Reconstruction (Reike et al 1997) • Vector Method (Georgopoulos et al 1983) • Optimal Linear Estimator (Abbot and Salinas 1994) • Gaussian Model (de Ruyter van Steveninck and Bialek 1988) • Metric Space (Victor and Purpura 1996)
We want to understand the neural code. We seek an answer to the question: How does neural activity represent information about environmental stimuli? “The little fly sitting in the fly’s brain trying to fly the fly”
encoding decoding Looking for the dictionary to the neural code … stimulus X() response Y(t)
… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X() Yi(t)| X() i = 1, 2, 3, 4
… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X() Yi(t)| X() i = 1, 2, 3, 4 Neural coding is stochastic!!
Similarly, neural decoding is stochastic: Y(t) Xi()|Y(t) i = 1, 2, … , 9
encoder:P(Y(t) |X()) environmental stimuli Y(t) X() neural responses decoder: P(X()|Y(t)) Probability Framework
Information Theory tells us that if the relationship between X and Y can be modeled as an optimalcommunication channel, then … P(X(),Y(t)) Areas of high probability A coding scheme needs to be stochastic on a fine scale and almost deterministic on a large scale. neural responses Y X environmental stimuli
How to determine a coding scheme? There are 2 methodologies in this search: encoding (determining P(Y|X)) decoding (determining P(X|Y)) As we search for a coding scheme, we proceed in the spirit of John Tukey: It is better to be approximately right than exactly wrong.
Spike Count Coding (Adrian and Zotterman 1926) Encoding as a non-linear process: the response tuning curve Response Amplitude Stimulus Amplitude • Hanging weights from a muscle ( Adrian 1926) • Stimulus Amplitude: mass in grams • Moving a pattern across the visual field of a blowfly (de Ruyter van Steveninck and Bialek, 1988) • Stimulus Amplitude: average velocity (omms/s) in 200ms window • In Spike Count Coding, response amplitude is the (spikes count)/time
Spike Count Coding P(Y |X) spikes/time Stimulus Amplitude An experimenter repeats each stimulus many times to estimate the encoder P(Y | X).
Stimulus Amplitude Spike Count Coding P(Y |X) spikes/time P(Y) P(X) And now you can get P(X | Y) from Bayes rule
Spike Count Coding the directional tuning curve (Miller, Jacobs and Theunissen 1991) The response tuning curves for the 4 interneurons in the cricket cercal sensory system. The preferred directions are orthogonal to each other.
Cons • Counting spikes/time neglects the temporal pattern of the spikes of the neural response, which • Potentially decreases the information conveyed • Known short behavioral decision times imply that many neurons make use of just a few spikes • Sensory systems respond to stimulus attributes that are very complex. That is, the space of possible stimuli for some systems is a very large (infinite dimensional) space. Hence, it is not feasible to present all possible stimuli in experiment. Pros • Some neural systems do seem to encode certain stimulus attributes by (number spikes)/time. • Can work well if the stimulus space is small (e.g. when coding direction in cricket cercal sensory system). Abbot, 2001
Poisson Model(Fatt and Katz 1952) P-type afferent responses in the electric fish to transdermal potential stimulus Electrical multi-site stimulation of chicken retina in vitro: (Xu, Payne and Nelson 1996) These (normalized) histograms give the probability per unit time of firing given that x(t) occurred: r[t |X()=x()] (Stett et al., 2000)
Poisson Model If we assume that the spikes are independent from each other given a stimulus X(), then we can model P(Y | X) as an inhomogenuous (dependent on time) Poisson process: If Y is the (spike count)/time, then P(Y | X()) = Poisson( r[t |X()] dt) If Y(t) is a spike train, then P(Y(t) |X()) = Poisson-like(r[t |X()])
Poisson Model When is the Poisson model a good one to try? Examine the mean and variance of the spike counts/time given each stimulus. For a Poisson process, they should be equal: Abbot, 2001
Pros • Have an explicit form of P(Y|X) • A Poisson process is a basic, well studied process Cons • Counting spikes/time neglects the temporal pattern of the spikes of the neural response. • Assuming that spikes are independent neglects the refractory period of a neuron. This period must be ‘small’ compared to mean ISI in order for Poisson model to be appropriate. To deal with this: • Berry and Meister (1998) have proposed a Poisson model that includes the refractory period. • Emory Brown (2001) uses a mixture of Gamma and inverse Gaussian models. • The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiments to estimate r(t | X()). (Stett et al., 2000)
Wiener / Volterra Series (1930, 1958) • The Taylor series for a function y = g(x): y(x) = y(x0) + y’(x0 ) (x - x0) + ½ y’’ (x0)(x - x0)2 + … = f0 + f1(x - x0) + f2(x - x0)2 + … • The Volterra series is the analog of a Taylor series for a functional Y(t) = G[X(t)]: Y(t) = f0 + d1 f1(1)X(t - 1) + d1d2 f2(1, 2)X(t - 1) X(t - 2) + … How to compute { fi } ?? • Wiener reformulated the Volterra series in a way so that the new coefficient functions or kernels could be measured from experiment. • There are theorems that assure that this series with sufficiently many terms provides a complete description for a broad class of systems. • The first Wiener kernel is proportional to the cross correlation of stimulus and response: f1 = <XY>/SX This is proportional to the spike triggered average when Y(t) is a spike train.
Wiener / Volterra Series Constructing the neural response (here, Y is the firing rate) from the first Wiener kernel … Reike et al. 1997: recordings from H1 of the fly Actual response Predicted response seems to be able to capture slow modulations in the firing fate.
Slice of a fly brain Pros • Computing the first Wiener kernel is inexpensive. • Not much data is required. Cons • Although it is theoretically possible to compute many terms in the Wiener series, practical low order approximations, of just f0and f1for example, don’t work well in practice (i.e. coding is NOT linear). • Wiener series is for a continuous function Y(t). This is fine when Y(t) = r[t |X()], the firing rate. But how do we construct a series to model the discrete spiking of neurons? • The Wiener series gives a specific Y(t) | X(). What is P(Y | X)? In principle, one can do a lot of repeated experiments to estimate P(Y | X). In practice, the preparation dies on you before enough data is collected.
Neural DecodingWhy Decoding? • Encoding looks non-linear in many systems. Maybe decoding is linear and hence easier. • It is conceivably easier to estimate P(X|Y) over an ensemble of responses {Y}, since {Y} live in a much smaller space than the {X}.
Linear Reconstruction Method(Reike et al 1997) • Consider a linear Volterra approximation (X Y): X(t)= K1()Y(t - )d =i K1(t – ti ) if we represent the discrete spike train as Y(t) = i (t – ti ), where the ith spike occurs at ti. • How to determine K1 ? Minimize the mean squared error: minK()|Xobserved(t) - i K(t – ti )|2dtX Fourier transform of average stimulus surrounding a spike Power spectrum of the spike train
Linear Reconstruction Method Reconstructing the stimulus with the linear filterK1: Reike et al. 1997: recordings from H1 of the fly Actual stimulus Predicted stimulus
Pros • It’s cheap. • The temporal pattern of the spikes is considered. • Even though encoding is non-linear (and hence the failure of the Wiener linear approximation), decoding for some neurons seems linear. Cons • Only one neuron is modeled. • No explicit form of P(X|Y)
Other Linear MethodsFor populations of neurons Assumptions: 1. Yi = number of spikes from neuron i in a time window. 2. X(t) is randomly chosen and continuously varying. • Vector Method (Georgopoulos et al 1983) X(t) = i Yi Ci where Ciis the preferred stimulus for neuron i. • Optimal Linear Estimator (OLE) (Abbot and Salinas 1994) X(t) = i Yi Di where Diis chosenso that dt |Xobserved(t) - i Yi Di |2 Y X is minimized so that Di = j Qij-1 Lj where Lj is center of mass of the tuning curve for cell i Qijis the correlation of Yi and Yj
Other Linear Methods (Abbot and Salinas 1994) cricket cercal sensory system Difference between the stimulus reconstructed by the Vector and OLE methods and the true stimulus presented. Evidence suggests that the cricket can code direction with an accuracy of up to 5 degrees. This data suggests that these algorithms decode direction as well as the cricket does in this experiment.
Pros • Vector Method • It’s cheap • This method is ideal when the tuning curve is a (half) cosine. • Has small error if Ci are orthogonal • OLE • Has smallest average MSE of all linear methods over a population of neurons Cons • Vector Method • It is not always obvious what the preferred stimulus Ci is for generic stimuli. • Does not work well if the Ci are not uniformly distributed (orthogonal) • Requires a lot of neurons in practice • Counting spikes/time neglects the temporal pattern of the spikes of the neural response Y(t). • No explicit form of P(X|Y)!
Gaussian Model (de Ruyter van Steveninck and Bialek 1988) • In experiment, let X(t) be a randomly chosen, continuously varying (GWN) • Approximate P(X|Y) with a Gaussian with meanX|Y and covarianceX|Y • computed from data. meanX|Y Reike et al. 1997 recordings from H1 of the fly Y(t)
Pros • The temporal pattern of the spikes is considered. • We have an explicit form for P(X|Y). Why should P(X|Y) be • Gaussian? This choice is justified by Jaynes maximum entropy • principle: of all models that satisfy a given set of constraints, • choose the one that maximizes the entropy. For a fixed mean and • covariance, the Gaussian is the maximum entropy model. • Cons • An inordinate amount of data is required • to obtain good estimates of covarianceX|Y=y • over all observed y(t). • One way to deal with the problem of • not having enough data is to cluster the • responses together and estimate a • gaussian model for each response • cluster.
Metric Space Approach (Victor and Purpura 1996) We desire a rigorous decoding method that … • Estimates P(X|Y). • Takes the temporal structure of the spikes of the neural responses Y(t) into account. • Deals with the insufficient data problem by clustering the responses. Assumptions: • The stimuli, X1, X2, … , XC , must be repeated multiple times. • There are a total of T neural responses: Y1 , …, YT . Abbot, 2001
Metric Space Approach The Method: 1. Given two spike trains, Yi and Yj , the distance between them is defined by the metricD[q](Yi , Yj), the minimum cost required to transform Yi into Yj via a path of elementary steps: a . adding or deleting a spike (cost = 1) b. shifting a spike in time by t(cost = q ·|t|) • 1/q is a measure of the temporal precision of the metric. • D[q=0](Yi , Yj) is just the difference in the number of spikes between the spike trains Yi and Yj. Decoding based on this metric is just counting spikes. • D[q=](Yi , Yj) gives infinitesimally precise timing of the spikes. Yi Yj
Metric Space Approach The Method: 1. Given two spike trains, Yi and Yj , the distance between them is defined by the metricD[q](Yi , Yj), the minimum cost required to transform Yi into Yj via a path of elementary steps: a . adding or deleting a spike (cost = 1) b. shifting a spike in time by t(cost = q ·|t|) • 1/q is a measure of the temporal precision of the metric. • D[q=0](Yi , Yj) is just the difference in the number of spikes between the spike trains Yi and Yj. Decoding based on this metric is just counting spikes. • D[q=](Yi , Yj) gives infinitesimally precise timing of the spikes. Yi Yj
Metric Space Approach • Let r1, r2, … rC be response classes.
X r Metric Space Approach • Let r1, r2, … rC be response classes. Let N be the classification matrix. r1 r2 r3 r4 r5 N: the classification matrix X1 X2 X3 X4 X5
Metric Space Approach • Let r1, r2, … rC be response classes. Let N be the classification matrix. • Suppose that Y1 was elicited by X1.Assign Y1to response class r3if • D[q](Y1 , Y)z Yelicited by X_ 3 1/z is the minimum over all Xk for k = 1, … , C. r1 r2 r3 r4 r5 N: the classification matrix X1 X2 X3 X4 X5
Metric Space Approach • Let r1, r2, … rC be response classes. Let N be the classification matrix. • Suppose that Y1 was elicited by X1.Assign Y1to response class r3if • D[q](Y1 , Y)z Yelicited by X_ 3 1/z is the minimum over all Xk for k = 1, … , C. • 4. Increment N1, 3 by 1. r1 r2 r3 r4 r5 N: the classification matrix X1 X2 X3 X4 X5
Metric Space Approach • Let r1, r2, … rC be response classes. Let N be the classification matrix. • Suppose that Yi was elicited by X.Assign Yito response class rif • D[q](Yi , Y)z Yelicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. • 4. Increment N, by 1. • 5. Repeat steps 3 and 4 for Yi for i = 1, … , T. r1 r2 r3 r4 r5 N: the classification matrix X1 X2 X3 X4 X5
Metric Space Approach • Let r1, r2, … rC be response classes. Let N be the classification matrix. • Suppose that Yi was elicited by X.Assign Yito response class rif • D[q](Yi , Y)z Yelicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. • 4. Increment N, by 1. • 5. Repeat steps 3 and 4 for Yi for i = 1, … , T. • After repeating the process for all Y elicited by X1 … X1 was presented 20 times eliciting 20 neural responses r1 r2 r3 r4 r5 N: the classification matrix X1 X2 X3 X4 X5
Metric Space Approach • Let r1, r2, … rC be response classes. Let N be the classification matrix. • Suppose that Yi was elicited by X.Assign Yito response class rif • D[q](Yi , Y)z Yelicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. • 4. Increment N, by 1. • 5. Repeat steps 3 and 4 for Yi for i = 1, … , T. • Then for all Y elicited by X2 … r1 r2 r3 r4 r5 X2 was presented 20 times eliciting 20 neural responses N: the classification matrix X1 X2 X3 X4 X5
X r Metric Space Approach • Let r1, r2, … rC be response classes. Let N be the classification matrix. • Suppose that Yi was elicited by X.Assign Yito response class rif • D[q](Yi , Y)z Yelicited by X_ 1/z is the minimum over all Xk for k = 1, … , C. • 4. Increment N, by 1. • 5. Repeat steps 3 and 4 for Yi for i = 1, … , T. • Until we have repeated the process for all Y, including the ones elicited by XC. r1 r2 r3 r4 r5 In this example, the T=100 responses (there were 20 neural responses elicited by each stimulus) were quantized or clustered into 5 classes. N: the classification matrix X1 X2 X3 X4 X5
Metric Space Approach Note that by normalizing the columns of the matrix N, we get the decoder P(X|r). Decode a neural response Y(t) by looking up its response class r in the normalized matrix N: r1 r2 r3 r4 r5 P(X | r) X1 X2 X3 X4 X5
Pros • The responses are clustered together. • P(X|r) estimates P(X|Y). • Considers the temporal pattern of the spikes. • Minimizing the cost function D[q] is intuitively a nice way to quantify jitter in the spike trains. In information theory, this type of cost function is called a distortion function. • What to choose for q and z ? The values thatmaximize the transmitted information from stimulus to response. Cons • D[q] imposes our assumptions of what is important in the structure of spike trains (namely that shifts and spike insertions/deletions are important). • The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiment.
Tune in next week, on Friday at the CBS seminar, to see how our method deals with these issues. So we’re looking for a decoding algorithm that … • Produces an estimate of P(X|Y) as well as of X()|Y(t). • Considers the temporal structure of the spike trains Y(t). • Makes no assumptions about the linearity of decoding. • Does not require that all stimuli be presented. That is, X(t) ought to be randomly chosen and continuously varying (such as a GWN stimulus). • Considers a population of neurons. • Deals with the problem of never having enough data by clustering the neural responses.