730 likes | 1.16k Views
Tests. Jean-Yves Le Boudec. Contents. The Neyman Pearson framework Likelihood Ratio Tests ANOVA Asymptotic Results Other Tests. Tests. Tests are used to give a binary answer to hypotheses of a statistical nature Ex: is A better than B?
E N D
Tests Jean-Yves Le Boudec
Contents • The Neyman Pearson framework • Likelihood Ratio Tests • ANOVA • Asymptotic Results • Other Tests 1
Tests • Tests are used to give a binary answer to hypotheses of a statistical nature • Ex: is A better than B? • Ex: does this data come from a normal distribution ? • Ex: does factor n influence the result ? 2
Example: Non Paired Data • Is red better than blue ? • For data set (a) answer is clear (by inspection of confidence interval) no test required 3
5.1 The Neyman-Pearson Framework • Given: data set a model withparameter (that, webelieve, explains the data) • Twohypotheses on (nullhypothesis) (alternative hypothesis) • Nested model: is a set of smaller dimension than 5
Example: Non Paired Data; Is Red better than Blue ? • Model: and are twoindependentiidsamples and 6
Example: Non Paired Data; Is Red better than Blue ? ANOVA Model • Model: and are twoindependentiidsamples and 7
Critical Region, Size and Power • Critical Region: as set of possible data values suchthatif data thenreject • Type 1 error: rejectwhenistrueSizeof a test = maximum proba of type 1 errorSize = shouldbesmall • Type 2 error: acceptwhen istruePower function: shouldbe large • Neyman Pearson framework: Design a test thatmaximizes power subject to size 8
Example : Paired DataIs A better than B ? • Reduction in execution time • Model: • First attempt: let us take as rejection regionSize = ? • Pb: the sup is 0 because of the term 9
Example : Paired DataIs A better than B ? • Reduction in execution time • Model: • First attempt: let us take as rejection regionSize = ? • Pb: the sup is 0 because of the term • Second attempt:where= estimator of variance • Nowwecancomputesuchthat the size is 95%: • Wefindwereject 10
Power 11
power Grey Zone • isapproximated by on the plot • power (badbut unavoidable)Grey zone: for power If trueis in grey zone, test willoftendeclare • For data at hand: power = 0.9997, Proba of type 2 error = 0.0003 12
p-value of a test • For the previousexample, with • The test consists in computing and see if • Considerwhereis a hypotheticalreplay. It isindependent of and wecan plot it:sayingis the same as saying • P-value of test = • Wereject if p-value issmall 13
power Grey Zone • Assume wewant to match statisticalsignificance and practical relevance • is the size of reductionweconsiderpracticallysignificant • Withwe have The type 2 and type 1 errors are matched • Ideally, the size of the test shouldbematched to the desiredresolution • In practice, itis not done 17
2. Likelihood Ratio Test • A special case of Neyman-Pearson • A Systematic Method to define tests, of general applicability 19
Example : Paired DataIs A better than B ? • Reduction in execution time • Model: • Let us compute the likelihood ratio test. 21
A Classical Test: Student Test • The model : • The hypotheses : 23
Example : Paired DataIs A better than B ? • Reduction in execution time • Model: • The likelihood ratio test is the Studenttest • Compare to one sided test: matters ! 25
Test versus Confidence Intervals • If you can have a confidence interval, use it instead of a test 27
The “Simple Goodness of Fit” Test • Model • Hypotheses 28
Mendel’s Peas • P= 0.92 ± 0.05 => Accept H0 31
3 ANOVA • Often used as “Magic Tool” • Important to understand the underlying assumptions • Model • Data comes from iid normalsample with unknown means and same variance • Hypotheses 32
The ANOVA Theorem • We build a likelihood ratio statistic test • The assumption that data is normal and variance is the same allows an explicit computation • it becomes a least square problem = a geometrical problem • we need to compute orthogonal projections on M and M0 35
Geometrical Interpretation • Accept H0 if SS2 is small • The theorem tells us what “small” means in a statistical sense 37
Compare Test to Confidence Intervals • For non paired data, we cannot simply compute the difference • However CI is sufficient for parameter set 1 • Tests disambiguate parameter sets 2 and 3 42
Test the assumptions of the test… • Need to test the assumptions • Normal • In each group: qqplot… • Same variance 43
4 Asymptotic Results 2 x Likelihood ratio statistic 45
Asymptotic Result • Applicable when central limit theorem holds • If applicable, radically simple • Compute likelihood ratio statistic • Inspect and find the order p (nb of dimensions that H1 adds to H0) • This is equivalent to 2 optimization subproblemslrs = = max likelihood under H1 - max likelihood under H0 • The p-value is 48
Composite Goodness of Fit Test • We want to test the hypothesis that an iid sample has a distribution that comes from a given parametric family 49