Chap 8 : Estimation of parameters & Fitting of Probability Distributions

Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before fitting probability laws to data.

Section 8.2: Fitting the Poisson Distribution to Emissions of Alpha Particles (classical example) Recall: The Probability Mass Function of a Poisson random variable X is given by: From the observed data, we must estimate a value for the parameter

What if the experiment is repeated? The estimate of will be viewed as a random variable which has a probability dist’n referred to as its sampling distribution. The spread of the sampling distribution reflects the variability of the estimate. Chap 8 is about fitting the model to data. Chap 9 will be dealing with testing such a fit.

Assessing Goodness of Fit (GOF): Example: Fit a Poisson dist’n to counts-p240 Informally, GOF is assessed by comparing the Observed (O) and the Expected (E) counts that are grouped (at least 5 each) into the 16 cells. Formally, use a measure of discrepancy such as the Pearson’s chi-square statistic to quantify the comparison of the O and E counts. In this example,

Null dist’n: is a random variable (as a function of random counts) whose probability dist’n is called its null distribution. It can be shown that the null dist’n of is approximately the chi-square dist’n with degrees of freedom df = no. of cells — no. of independent parameters fitted — 1. Notation: df = 16 (cells) –1(parameter ) –1 = 14 The larger the value of , the worse the fit.

p-value: Figure 8.1 on page 242 gives a nice feeling of what a p-value might be. The p-value measures the degree of evidence against the statement “model fits data well == Poisson is the true model.” The smaller the p-value, the worse the fit or there is more evidence against the model. Small p-value means then rejecting the null or saying that “the model does NOT fit the data well.” How small is small? when P-value < = ALPHA, where ALPHA is the level of confidence.

8.3: Parameter Estimation:MOM & MLE Let the observed data be a random sample i.e. a sequence of I.I.D. random variables whose joint distribution depends on an unknown parameter (scalar or vector). An estimate of will be a random variable function of the whose dist’n is known as its sampling dist’n. The standard deviation of the sampling dist’n will be termed as its standard error.

8.4: The Method of Moments Definition: the (pop’n) moment of a random variable X is denoted by and its (sample) moment by is viewed as an estimate of Algorithm: MOM estimates parameter(s) by finding expressions for them in terms of the lowest possible (pop’n) moments and then substituting (sample) moments into the expressions.

8.5: The Method of Maximum Likelihood Algorithm: Let be a sequence of I.I.D. random variables. • The likelihood function is • The MLE of is that value of that maximizes the likelihood function or maximizes the natural logarithm (since the logarithm is monotonic function) • The log-likelihood function is then to be maximized to get the MLE.

8.5.1: MLEs of Multinomial Cell Probabilities Suppose that , the counts in cells , follows a multinomial distribution with total count n and cell probabilities Caution: the marginal dist’n of each is binomial BUT the … are not INDEPENDENT i.e. their joint PMF is not the product of the marginal PFMs. The good news is that the MLE still applies. Problem: Estimate the p’s from the x’s.

8.5.1a: MLEs of Multinomial Cell Probabilities (cont’d) To answer the question, we assume n is given and we wish to estimate From the joint PMF , the log-likelihood becomes: To maximize such a log-likelihood subject to the constraint , we use a Lagrangemultiplier to get after maximizing

8.5.1b: MLEs of Multinomial Cell Probabilities (cont’d) Deja vu: note that the sampling dist’n of the is determined by the binomial dist’ns of the Hardy-Weinberg Equilibrium: GENETICS Here the multinomial cell probabilities are functions of other unknown parameters ; that is Read example A on page 260-261.

8.5.2: Large Sample Theory for MLEs Let be an estimate of a parameter based on The variance of the sampling dist’n of many estimators decreases as the sample size n increases. An estimate is said to be a consistent estimate of a parameter if approaches as the sample size n approaches infinity. Consistency is a limiting property that does not require any behavior of the estimator for a finite sample size.

8.5.2: Large Sample Theory for MLEs (cont’d) Theorem: Under appropriate smoothness conditions on f , the MLE from an I.I.D sample is consistent and the probability dist’n of tends to N(0,1). In other words, the large sample distribution of the MLE is approximately normal with mean (say, the MLE is asymptotically unbiased ) and its asymptotic variance is where the information about the parameter is:

8.5.3: Confidence Intervals for MLEs Recall that a confidence interval (as seen in Chap.7) is a random interval containing the parameter of interest with some specific probability. Three (3) methods to get CI for MLEs are: • Exact CIs • Approximated CIs using Section 8.5.2 • Bootstrap CIs

8.6: Efficiency & Cramer-Rao Lower Bound Problem: Given a variety of possible estimates, the best one to choose should have its sampling distribution highly concentrated about the true parameter. Because of its analytic simplicity, the mean square error, MSE, will be used as a measure of such a concentration.

8.6: Efficiency & Cramer-Rao Lower Bound (cont’d) Unbiasedness means Definition: Given two estimates, and , of a parameter , the efficiency of relative to is defined to be: Theorem: (Cramer-Rao Inequality) Under smooth assumptions on the density of the IID sequence when is an unbiased estimate of , we get the lower bound:

8.7: Sufficiency Is there a function containing all the information in the sample about the parameter ? If so, without loss of information the original data may be reduced to this statistic . Definition: a statistic is said to be sufficient for if the conditional dist’n of , given T = t, does not depend on for any value t In other words, given the value of T, which is called a sufficient statistic, one can gain no more knowledge about the parameter from further investigation with respect to the sample dist’n.

8.7.1: a Factorization Theorem How to get a sufficient statistic? Theorem A: a necessary and sufficient condition for to be sufficient for a parameter is that the joint PDF or PMF factors in the form: Corollary A: if T is sufficient for , then the MLE is a function of T.

8.7.2: The Rao-Blackwell thm The following theorem gives a quantitative rationale for basing an estimator of a parameter on an existing sufficient statistic. Theorem: Rao-Blackwell Theorem Let be an estimator of with for all Suppose that T is sufficient for , and let . Then, for all , The inequality is strict unless

8.8: Conclusion Some key ideas in Chap.7 such as sampling distributions, Confidence Intervals were revisited MOM and MLE were applied to some distributional theory approximations. Theoretical concepts of efficiency, Cramer-Rao lower bound, and efficiency were discussed. Finally, some light was shed in Parametric Bootstrapping.

Chap 8 : Estimation of parameters & Fitting of Probability Distributions