1 / 24

Information Theory in Neuroscience

Information Theory in Neuroscience. Noise, probability and information theory MSc Neuroscience Prof. Jan Schnupp jan.schnupp@dpag.ox.ac.uk. Neural Responses are Noisy. Recordings in cat A1 to recordings of sheep and frog sounds.

holt
Download Presentation

Information Theory in Neuroscience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Theory in Neuroscience Noise, probability and information theory MSc Neuroscience Prof. Jan Schnupp jan.schnupp@dpag.ox.ac.uk

  2. Neural Responses are Noisy • Recordings in cat A1 to recordings of sheep and frog sounds. • Seventeen identical repetitions of a stimulus do not produce 17 times the same spike pattern. • How much information does an individual response convey about the stimulus? cats\9920\zoo50.src

  3. Joint and Marginal Probabilities • A plausible hypothetical example

  4. Joint Probabilities and Independence • Let s be stimulus present, r be neuron responds. • p(s,r)=p(r,s) is the probability that stimulus is present and that neuron responds. (joint probability) • p(s|r) is the probability that the neuron responds given that a stimulus was present (conditional probability) • Note: p(s|r) =p(s,r)/p(r) • If r and s are independent, then p(s,r)=p(s) • p(r) • Therefore, if r,s independent, then p(s|r)=p(s), so knowing that the neuron responded does not change my view on how likely it is that there was a stimulus, i.e. the response does not carry information about the stimulus.

  5. What is Information? • If I tell you something you already know I don’t give you any (new) information. • If I tell you something that you could have easily guessed I give you only little information. • The less likely a message, the more “surprising” it is: Surprise=1/p. • The information content of a message is proportional to the order of magnitude of the message’s “surprise”: I=log2(1/p) = -log2 (p) • Examples: • “A is the first letter of the alphabet”: p=1, I=-log2(1)=0 • “I flipped a coin, it came up heads”: p=0.5, I=-log2(0.5)=1 • “His phone number is 928 399”: p=1/107 I=log2(107)=23.25

  6. “Entropy” S(s) or H(s) • Measures “uncertainty” about a message s. • Equal to the “average” information content of messages from a particular source. • Note that, to estimate entropy, the statistical properties of the source must be known, i.e. one must know what values s can take and how likely (p(s)) they are. • Entropy of flipping a fair coin:S= - (½ • log2(½) + ½ • log2(½)) = -2 • ½ • -1 = 1 • Convention: 0 • log(0) = 0; • Entropy of flipping a trick coin with “heads” on both sides:S= - (1 • log2(1) + 0 • log2(0)) = - (0+0) = 0 • Entropy of rolling a die:S= -6 • 1/6 • log2(1/6) = -1 • log2(1/6) = log2(6) = 2.585

  7. If two random processes are statistically independent, their entropies add • In this example:S(coin1,coin2)=-4 • 1/4 • log2(1/4) = 2 = S(coin1)+S(coin2)

  8. If two processes are not independent, their joint entropy is less than the sum of the individual entropies In this example, the two coins are linked so that their outcome is 100% correlated. S(s)=S(r)=1 => S(s)+S(r) = 2 S(s,r)= -2 • 1/2 • log2(1/2) = 1

  9. “Mutual Information” I(r,s) • Also sometimes called the “transmitted information” T(r;s). • Equal to the difference between joint-entropy and sum of individual entropies. • Measures how much uncertainty about one random variable is reduced if the value of another random variable is known.

  10. Traffic Light ExampleSwiss Drivers Here:I(Red,Stop)= ½ • log2(½ / (½ • ½)) + 0 + ½ • log2(½ / (½• ½)) + 0 = log2(2) = 1

  11. Traffic Light ExampleEgyptian Drivers Note: In this case p(Stop)=0.25, hence S(Go) = 0.8133 < 1 Here:I(Red,Stop)= 0.2 • log2(0.2 / (0.25 • 0.5)) + 0.3 • log2(0.3 / (0.75• 0.5)) + 0.05 • log2(0.05 / (0.25• 0.5)) + 0.45 • log2(0.45 / (0.75• 0.5)) = 0.3545

  12. Hypothetical Example • Non-monotonic (quadratic) relationship between stimulus and response. • No (linear or first order) correlation between stimulus and response. • Nevertheless, the response is informative about the stimulus. E.g. large response implies mid-level stimulus. • Correlation is zero, but mutual information is large.

  13. Estimating Information in Spike Counts. Example: • Data from Mrsic-Flogel et al. Nature Neurosci (2003) • Spatial receptive fields of A1 neurons were mapped out using “virtual acoustic space stimuli”. Left panel: the diameter of the dots is proportional to the spike count. • Space was carved up into 24 “sectors” (right panel). The question is: what is the mutual information between spike count and sector of space? 24 “Sectors”, p(s)=1/24, S(s) = 4.585

  14. Estimating Information in Spike Counts - continued. p(sector, count) We use the relative frequencies (how often did we observe 0, 1, 2, … spikes if the stimulus was in quadrant 1,2,3,…) as estimates for p(r,s). p(s) is fixed by the experimenter and p(r) is estimated from the pooled responses. These values are then plugged into the formula above sector I(s,r)=0.7019

  15. p(sector, count) sector Difficulties with Estimating Mutual Information: Bias! • To calculate transmitted information, we use observed frequencies as estimates for true underlying probabilities. • However, to estimate probabilities (particularly of rare events) accurately, one needs a lot of data. • Inaccuracies in the estimates of p(s,r) tend to lead to overestimates of the information content. • Example: here on the right, responses were randomly re-assigned to stimulus classes. The randomisation should have led to statistical independence and hence zero information. Nevertheless, a value of 0.1281 bits was obtained. I(s,r)=0.1281

  16. Estimating Information in Spike Patterns: The Eskandar Richmond & Optican (1992) Experiment • Monkeys were trained to perform delayed non-match to target tasks with a set of Walsh patterns • Neural responses in area TE of infra - temporal cortex were recorded while the monkeys performed the task

  17. IT responses • Example of responses recorded by Eskander et al. • Different Walsh patterns produced different response patterns as well as different spike counts.

  18. Principal Component Analysis of Response Patterns Principle Components • PCA makes it possible to summarize complex response shapes with relatively few numbers (The “coefficients” of the first few principle components) IT Neuron Response Patterns PCA coefficients

  19. Eskandar et al. Results • Spike count plus the first 3 PCA coefficients (T3, gray bars) transmit 30% more information about stimulus identity (“Pattern”) than Spike count alone (TS, white bars). • Most of the IT response is attributable to stimulus identity (which Walsh pattern?), only little to task “context” (sample, match or non-match stimulus).

  20. Rat “Barrel” Cortex • Rat S1 has a a large “barrel field” in which the vibrissae are represented.

  21. Spike Latency Coding in Rat Somatosenory Cortex • Panzeri et al (2001 Neuron Vol. 29, 769–777) recorded from the D2 barrel, stimulated D2 whisker as well as surrounding whiskers. Response PSTHs shown on right • While spike counts were not very informative about which whisker was stimulated, response latency carried large amounts of information.

  22. Applications of Information Theory in Neuroscience – Some Further Examples • Tovee et al (J Neurophysiol. 1993) found that the first 50 ms or so of the response of “face cells” in monkey inferotemporal cortex contained most of the information contained in the entire response pattern • Machens et al (J Neurosci 2001) found that grasshopper auditory neurons transmit information about sound stimuli with highest efficiency if the properties of these stimuli match the time scales and amplitude distributions of natural songs. • Mrsic-Flogel et al (Nature Neurosci 2003) found that responses of A1 neurons in adult ferrets carry more information about the spatial location of a sound stimulus than do responses of infant neurons. • Li et al (Nature Neurosci 2004) found that the mutual information between visual stimuli and V1 responses can depend on the task an animal is performing (attention?).

  23. Information Theory in Neuroscience: a Summary • Transmitted Information measures how much the uncertainty about one random variable can be reduced by observing another. • Two random variables are “mutually informative” if they are not statistically independent (p(x,y) ≠ p(x) p(y)) • However, information measures are agnostic about how the information should best be decoded, or indeed about how much (if any) of the information contained in a spike train can be decoded and used by the brain. • Information theory thinks about neurons merely as “transmission channels” and assumes that the receiver (i.e. “higher” brain structures) knows about possible states and their entropies. • Real neurons have to be encoders and decoders as much as they are transmission channels. • The information content of a spike train is hard to measure accurately, but at least rough (and potentially useful) estimates can sometimes be obtained.

  24. Further Reading • Trappenberg, T. P. (2002). "Fundamentals of computational neuroscience," (Oxford University Press, Oxford). • Rolls, E. T., and Treves, A. (1998). "Neural networks and brain function." (Oxford University Press, Oxford), pp. appendix 2. • Rieke, F. (1997). "Spikes: exploring the neural code," (MIT Press, Cambridge, Mass.; London). • Eskandar EN, Richmond BJ, and Optican LM. Role of inferior temporal neurons in visual memory. I. Temporal encoding of information about visual images, recalled images, and behavioral context. J Neurophysiol 68: 1277-1295, 1992. • Furukawa, S., and Middlebrooks, J. C. (2002). "Cortical representation of auditory space: information-bearing features of spike patterns," J Neurophysiol 87, 1749-62. • Panzeri S, Petersen RS, Schultz SR, Lebedev M, and Diamond ME. The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron 29: 769-777, 2001.

More Related