210 likes | 349 Views
DATA ANALYSIS. Module Code: CA660 Lecture Block 5. ESTIMATION /H.T. Rationale, Other Features & Alternatives. Estimator validity – how good? Basis statistical properties (variance, bias, distributional etc.)
E N D
DATA ANALYSIS Module Code: CA660 Lecture Block 5
ESTIMATION /H.T. Rationale, Other Features & Alternatives • Estimator validity– how good? • Basisstatistical properties (variance, bias, distributional etc.) • Bias where is the point estimate, the true parameter. Bias can be positive, negative or zero. • Permits calculation of other properties, e.g. • where this quantity and variance of estimator are the same if estimator is unbiased. • Obtained by both analytical and“bootstrap methods” • Similarly, for continuous variables • or for b bootstrap replications,
Estimation/H.T. Rationale etc. - contd. • For any,estimator , even unbiased,there is a difference between estimator and true parameter = sampling error • Hence the need for probability statements around • C.L. for estimator = (T1 , T2), similarly to before and the confidence coefficient. If the estimator is unbiased, in other words, = P{that true parameter falls into the interval}. • In general, confidence intervals can be determined using parametricand non-parametric approaches, where parametric construction needs a pivotal quantity= variable which is a function of parameter and data, but whose distribution does not depend on the parameter.
Related issues in Hypothesis Testing -POWER of the TEST • Probability of False Positiveand False Negativeerrors • e.g.false positive if linkage between two genes declared, when really independent • Hypothesis Test Result • Fact Accept H0Reject H0 • H0 True1- False positive • = Type I error = • H0 False False negative Power of the Test • =Type II error= = 1- • Power of the TestorStatistical Power=probability of rejecting H0when correct to do so. (Related strictly to alternative hypothesis and )
Example on Type II Error and Power • Suppose have a variable, with known population S.D. = 3.6. From the population, a r.s. size n=100, used to test at =0.05, that • critical values of C.I for a 2-sided test are: • for =0.05 where for , i = upper or lower and 0 under H0 • So substituting our values gives: • But, if H0false, is not17.5, but some other value …e.g. 16.5 say ??
Example contd. • Want new distributionwith mean = 16.5, i.e. new distribution isshifted w.r.t. the old. • Thus the probability of the Type II error - failingtoreject false H0is the area under the curve in the new distribution which overlaps thenon-rejectionregion specified under H0 • So, this is • Thus, probability of taking the appropriate action(rejectingH0 when this is false) is 0.791 = Power
Shifting the distributions Non-Rejection region Rejection region /2 Rejection region /2 16.79 17.5 18.21 16.5
Example contd. Power under alternative for given • Possible values of 1- • under H1for H0 false • 16.0 0.0143 0.9857 • 16.5 0.2090 0.7910 • 17.0 0.7190 0.2810 • 18.0 0.7190 0.2810 • 18.5 0.2090 0.7910 • 19.0 0.0143 0.9857 • Balancing and : tends to be largec.f. unless original hypothesis way off. So decision based on a rejectedH0more conclusive than one based on H0not rejected, as probability of being wrong is larger in the latter case.
SAMPLE SIZE DETERMINATION • Example: Suppose wanted to design a genetic mapping experiment, or comparative product survey. Conventional experimental design - ANOVA), genetic marker type (or product type) and sample sizeconsidered. • Questions might include: • What is the statistical powerto detect linkage for certain progeny size? (or common ‘shared’ consumer preferences, say) • What is the precision of estimated R.F. (or grouped preferences) when sample size is N? • Sample size needed for specificStatistical Power • Sample size needed for specific Confidence Interval
Sample size - calculation based on C.I. For some parameter , Normal approximation approach valid, C.I. are U =standardized normal deviate (S.N.D.) and range is from lower to upper limits, i.e. for 95% limits is just a precision measurement for the estimator Given a true parameter , So manipulation gives:
Sample size - calculation based on Power (firstly, what affects power)? • Suppose = 0.05, =3.5, n=100, testing H0: 0=25 when true =24; assume H1 : 1 < 25. Sample mean found = 24.45. • One-tailedtest (U = 1.645) : shift small, lower limit of original distribution virtually coincides with actual sample value • Under H1Power = 0.50+0.39 = 0.89; correct decision 89% of time • Note:Two-sided testat = 0.05 gives critical values, under H0 given by • : equivalently UL= + 0.89, Uu = 4.82 for H1 • In general: substitute for limits & then recalculate for new = 1 • So, P{do not rejectH0: =25 when true mean =24} = 0.1867 = (Type II) • Thus, Power= 1 - 0.1867 = 0.8133
Sample Size and Power contd. • Suppose, n=25, other values same. 1-tailed now • Power = 0.4129 • Suppose = 0.01, critical values 2-tailed • with, equivalently,UL = + 0.29, UU = +5.43 • So, P{do not rejectH0: =25 when true mean =24} = 0.1141 • Power =0.8859 • FACTORS: , n and type of test (1- or 2-sided), true parameter value • where subscripts 0 and 1 refer to null and alternative, and value taken as ‘generic’ (either all in one tail, 1-sided test/limit or split between two, 2-sided test/limit)
‘Other’ Estimation/Test MethodsNON-PARAMETRICS/DISTN FREE • Standard Pdfscan not be assumed for data, sampling distributions or test statistics – uncertain due to small or unreliable data sets, non-independence etc. Parameter estimation - not key issue. • Example/ Empirical-basis. Weaker assumptions. Less ‘information’ e.g. median used. Simple hypothesis testing as opposed to estimation. Power and efficiency are issues. • Counts - nominal, ordinal (natural non-parametric data type/ measure). • Nonparametric Hypothesis Tests- (has parallels to parametric case). • e.g. H.T. of locus orders requires complex ‘test statistic’ distribution, so need to construct empirical pdf. Usually, assume the null hypothesis and use re-samplingtechniques, e.g. permutation tests, bootstrap, jack-knife.
LIKELIHOOD METHOD - DEFINITIONS • Suppose X can take a set of values x1,x2,…with • whereis a vector of parameters affecting observed x’s • e.g. . So can say something about P{X} if we • know, say, • Butnot usually case, i.e. observe x’s, knowing nothing of • Assuming x’s a random sample size n from aknowndistribution, then • likelihoodfor • Finding most likely or s for given data is equivalent to Maximising the Likelihood function, (where M.L.E. is )
LIKELIHOOD –SCORE and INFO. CONTENT • The Log-likelihoodis a supportfunction[S()] evaluated at point, ´say • Support function for any other point, say ´´ can also be obtained – basis for computational iterations for MLE e.g. using Newton-Raphson • SCORE = first derivative of support function w.r.t. the parameter • or, numerically/discretely, • INFORMATION CONTENT evaluated at (i) arbitrary point =Observed Info.(ii)support function maximum = Expected Info.
Example - Binomial variable(e.g. use of Score, Expected Info. Content to determine type of mapping population and sample size for genomics experiments) Likelihood function Log-likelihood Assume n constant, so first term can be ignored for given x - invariant Maximisingw.r.t. p i.e. set the derivative of S w.r.t. to 0 so SCOREsoM.L.E. How does it work, why bother?
Numerical Examples See some data sets and test examples: Basics: http://www.unc.edu/~monogan/computing/r/MLE_in_R.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.74.671&rep=rep1&type=pdf Context: http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.htmlAll sections useful, but especially examples, sections 1-3 and 6 Also, e.g. for R http://www.montana.edu/rotella/502/binom_like.pdf forSPSS– see e.g. tutorial for data sets or http://www.spss.ch/upload/1126184451_Linear%20Mixed%20Effects%20Modeling%20in%20SPSS.pdfgeneral for mixed Linear Models For SAS – of possible interest also for Newton-Raphson http://blogs.sas.com/content/iml/2011/10/12/maximum-likelihood-estimation-in-sasiml/
Bayesian Estimation- in context • Parametric Estimation- in “classical approach” f(x,) for a r.v. X of density f(x) , with the unknown parameter dependency of distribution on parameter to be estimated. • Bayesian Estimation- is a random variable, so can consider the density as conditional and write f(x| ) • Given a random sampleX1,X2,… Xnthe sample random variables are jointly distributed with parameter r.v. . So, joint pdf • Objective - to form an estimator that gives value of , dependent on observations of the sample random variables. Thus conditional density of given X1,X2,… Xn also plays a role. This is the posterior density
Bayes - contd. • Posterior Density • Relationship - prior and posterior: • where () prior density of • Value:Close to MLEforlarge n, or for small n if sample values compatible with the prior distribution. Also, has strong sample basis, -(simpler to calculate than M.L.E.)
Estimator Comparison in brief. • Classical: uses objective probabilities, intuitive estimators, additional assumptions for sampling distributions: good properties for some estimators. • Moments : { less calculation, less efficient. Despite analytical solutions & low bias, not well-used for large-scale data because less good asymptotic properties; even simple solutions may not be unique.} • Bayesian - subjective prior knowledge, sample info. , close to MLE under certain conditions - see earlier. • LSE- if assumptions met, ’s unbiased + variances obtained, {(XTX)-1} . Few assumptions for response variable distributions, just expectations, variance-covariance structure. (Unlike MLE where need to specify joint prob. distribution of variables). Requires additional assumptions for sampling distns. Close to MLE if these are met. Computation easier.