710 likes | 893 Views
Introduction: statistical and machine learning based approaches to neurobiology. Shin Ishii Nara Institute of Science and Technology. Mathematical Fundamentals: Maximum Likelihood and Bayesian Inferences. Head. Tail. Tail. Tail. Tail. Likelihood.
E N D
Introduction: statistical and machine learning based approaches to neurobiology Shin Ishii Nara Institute of Science and Technology
Mathematical Fundamentals:Maximum Likelihood and Bayesian Inferences
Head Tail Tail Tail Tail Likelihood Parameter: rate of head appearance in an individual trial Coin tossing • Tossing a skewed coin • How often does the head appear for this coin? • Probability of coming up one head in five tossing: One head comes up in five tossing (Note: each trial is independent)
Likelihood of parameter : Likelihood of parameter : Which parameter is better for explaining the observed data? What is the most likely parameter ? How to determine it? It seems natural to set according to the frequency of coming up the head. Likelihood function • Likelihood: evaluation of the observed data • viewed as a function of the parameter Really?
We can measure the difference according to an objective and numerical value. difference Note: KL divergence is not a metric. Kullback-Leibler (KL) divergence • A measure of the difference between two probability distributions: and
true distribution minimize divergence trial distribution Using the observed data, we want to estimate the true distribution using a trial distribution. data set The smaller the KL divergence , the better an estimate. Minimize KL divergence • Random events are drawn from the real distribution
Constant: independent of parameter To minimize KL divergence, we have only to maximize the second term with respect to the parameter . Minimize KL divergence • KL divergence between the two distributions
data set Log likelihood • They are the same: • Minimizing the KL divergence • Maximizing the likelihood Likelihood and KL divergence • The second term is approximated by the sample mean:
Head Tail Tail Tail Tail Same as intuition Maximization condition Maximum Likelihood (ML) estimation • Maximum likelihood (ML) estimate: • What is the most likely parameter in the coin tossing?
If the infinite number of observations could be obtained, an ML estimate becomes the real parameter. Infeasible What happens when only a limited number of observations have been obtained from the real environment? R. Fisher (1890-1962) Property of ML estimate • As the number of observations increases, the squared error of an estimate decreases in order. • ML estimate is asymptotically unbiased.
Head Head Head Head Head Tail Tail Tail Tail Tail It may just happen to come up four consecutive tails. It may be detrimental to assume the parameter as See... Four consecutive tails occurred by chance. The ML estimate overfits to the first observations. How to avoid this overfitting? Consider extreme case: If the data consists of one head in a single tossing, an ML estimate gives 100%. Not reasonable Five more tossing... Problem with ML estimation • Is it a really skewed coin?
information obtained from data a posteriori information a priori information = + Bayesian approach • Bayes theorem Prior Likelihood Posterior We have no information about the probably skewed coin. Then, we now assume that the parameter is distributed around . Prior distribution
information obtained from data a posteriori information a priori information = + observed data Bayesian approach • Bayes theorem Prior Likelihood Posterior One head and four tails. Hmm... It may be a skewed coin, but better consider other possibilities. Likelihood function
information obtained from data a posteriori information a priori information = + The parameter is distributed mainly between . Variance (uncertainty) exists. Bayesian approach • Bayes theorem Prior Likelihood Posterior Posterior distribution
Frequentist That can’t be! The prior distribution leads to a subjective distortion against estimation. The estimation process must be objective to obtained data. T. Bayes (1702-1761) Bayesian No problem. Uncertainty of random events (subjective probability) depends on the amount of information obtained from data and prior knowledge of the events. R. Fisher (1890-1962) Property of Bayesian inference • Bayesian view: probability represents uncertainty of random events (subjective value).
Bioinfomatics User support system (Bayesian Network) Application of Bayesian approaches • Data obtained from the real world: • sparse • high dimension • unobservable variables Bayesian methods are available
A neural decoding problem How the brain works? • Sensory information is represented in sequences of spikes. • When the same stimulus is repeatedly input, spike occurrence varies between trials. An indirect approach is to reconstruct the stimuli from the observed spike trains.
Bayesian application to a neural code Spike train (observation) Stimulus (prior knowledge) ? Time Possible algorithm of stimulus reconstruction (estimation) from spike train only (Maximum likelihood estimation) spike train & stimulus (Bayes estimation) Note: we focus on whether spike trains include stimulus information, NOT BUT whether the algorithm is true in the brain.
‘Observation’ depends on ‘Prior’ (Bialek et al., Science, 1991) Stimulus Black box Neural system Time Estimated stimulus Spike train Estimation algorithm Time Time
‘Observation’ depends on ‘Prior’ Stimulus distribution Black box Neural system s Estimated stimulus distribution Estimation algorithm x s
Simple example of signal estimation Particular value of observation: x Incoming signal: s Noise h ( ) Observation = Signal + Noise : + = Estimation stimulus
Simple example of signal estimation • If the probability that one observes a particular x with signal s just depends on the noise h , andthe noise his supposed to be chosen from a Gaussian, • If the signal sis supposed to be chosen from a Gaussian, • So the posterior is,
Posterior Prior knowledge Likelihood (Observation) Simple example of signal estimation Maximum likelihood estimation: maximize K Bayes estimation: maximize 1 SNR Bayes theorem:
Gaussian visual stimulus Time Signal estimation of a fly (Bialek et al., Science, 1991) Calliphora erythrocephala Movement sensitive neuron (H1) • Visually guided flight • Time scale 30 ms of behavior • H1 firing rate : 100-200 spikes/s Behavioral decisions are based on a few spikes.
Posterior Prior knowledge Likelihood (Observation) Signal estimation of a fly (Bialek et al., Science, 1991) Observation Stimulus Encoder X Time Estimated stimulus maximizes However, can not be measured directly. Bayes theorem:
Kernel reconstruction and least square Estimated stimulation: can not still calculated, because can not be defined. Next step alternative calculation: Choosing the kernel in the function which minimizes the square error
Signal estimation of a fly (Bialek et al., Science, 1991) Stimulus Estimated stimulus Kernel
Rat hippocampal CA1 cells (Lever et al., nature, 2002) Each place cell shows high activity when the rat is located at a specific position Case of mammalian O’Keefe’s place cell It is known that hippocampal CA1 cells would represent the position is a familiar field.
Incremental Bayes estimation (Brown et al., J. Neurosci., 1998) (Lever et al., nature, 2002) Each place cell shows high activity when the rat is located at a specific position Case of mammalian Question: Can one estimate rat’s position in the field from firing patterns of rat hippocampal place cells?
Rat’s position at Posterior Prior knowledge Likelihood (Observation) Sequential Bayes estimation from spike train (Brown et al., J. Neurosci., 1998) Rat position can be estimated by integrating the recent place cell activities and the position estimate from the history of activities. Observation: Spike train of a place cell Prior stimulation: . . . . . . time Bayes theorem:
Prior: Rat’s position Incremental Bayes estimation from spike train (Brown et al., J. Neurosci., 1998) Observation: Spike train of place cells . . . . . . Time
Incremental Bayes estimation from spike train (Brown et al., J. Neurosci., 1998) . . . . . . Time Observationprobability is the function of the firing rate of cells, which depends on the position & theta rhythm. Firing rate of a place cell depends on • position component (receptive field) • theta phase component
Inhomogeneous Poisson process for spike train (Brown et al., J. Neurosci., 1998) Firing rate of a place cell depends on • preferred position (receptive field) • preferred phase in the theta rhythm Position component (asymmetric Gaussian): Theta phase component (cosine): Instantaneous firing rate: The parameters were determined by maximum likelihood.
Position estimation from spike train (Brown et al., J. Neurosci., 1998) Assumption: The path of the rat may be approximated as a zero mean two-dimensional Gaussian random walk. Parameters, and were estimated by ML. Finally, estimation procedure is as follows: • Encoding stage: estimate parameters, , , , and . • Decoding stage: estimate rat’s position • by incremental Bayes method at each spike event • with the assumption of Gaussian random walk.
spike! spike! spike! spike! spike! spike! Bayes estimation from spike train (Brown et al., J. Neurosci., 1998) Real rat EKF estimation with variance Calculation of posterior distribution is done in discontinuous time steps; when a spike occurs as a new observation.
X Prior Likelihood Position estimation from spike train (1) (Brown et al., J. Neurosci., 1998) Mouse position Estimation Maximum correlation Bayes estimation Maximum likelihood Posterior = Model activity Correlation Likelihood Observed firing pattern
X Prior Likelihood Position estimation from spike train (2) (Brown et al., J. Neurosci., 1998) Mouse position Estimation Maximum correlation Bayes estimation Maximum likelihood The ML and maximum correlation methods ignore the history of neural activities, but the incremental Bayes incorporates it as a prior. Posterior = Model activity Correlation Likelihood Observed firing pattern
Information transmissionin neural systems Spike train (neural response) Encoding Environmental stimulus Decoding • How does a spike train codeinformation about the corresponding stimuli? • How efficient is the information transmission? • Which kind of coding is optimal?
Encoder Decoder Observable Observable Transmitter Receiver Destination Signal Receivedsignal Message Noise source Information transmission:Generalized view Informationsource Channel Message Shannon’s communication system (Shannon, 1948)
Neural coding is stochastic process Stimulus Observed Spike trains Neuronal responses against a given stimulus are not deterministic but stochastic, and the stimulus against each response is also probabilistic.
Shannon’s Information • Smallest unit of information is “bit” • 1 bit = the amount of information needed to choose between two equally-likely outcomes (eg: tossing a coin) • Properties: • Information for independent events are additive over constituent events • If we already know the outcome, there is no information
Independent events: Implies: Certain events: Implies: Shannon’s Information Property 1 Property 2
Head Tail Eg. Tossing a coin • Tossing an even coin Observed
Head Tail Eg. Tossing a coin • Tossing a horribly skewed coin… Observed Observing ordinary event has low information,but observingrare event is highly informative.
Head Tail Tail Tail Tail Eg. Tossing 5 coins • Case 1: even 5 coins • Case 2: skewed 5 coins
On average, how much information do we get from an observation drawn from the distribution? discrete continuous Entropy • Entropy is the expectation of the information over all possible observations Entropy can be defined…
Some properties of entropy • Scalar property of a probability distribution • Entropy is maximum if P(X) is constant • Least certainty of the event • Entropy is minimum if P(X) is a delta function • Entropy is always positive • The higher the entropy is, the more you learn (on average) by observing values of the random variable • The higher the entropy is, the less you can predict the values of the random variable
Entropy Head Tail Eg. Tossing a coin Entropy reaches the maximum when each event occurs with equal probability, i.e., occurs most randomly.