E N D
Mann-Whitney Test • If X1, X2, … Xm is a sample of size m from a population, and Y1, Y2, … Yn is a sample of size n from another population, then the Mann-Whitney statistic, U, is the number of (Xi,Yj) pairs for which Xi<Yj. U is used to test the same null hypothesis of equal population distributions and p-values are given in Table A4. • Go over Example 2.6.1 - use R • First note that W (sum of ranks, the Wilcoxon statistic) and U (the Mann-Whitney statistic) are equivalent: • Let W2=sum of ranks of the Y's and note that for any Yj, R(Yj)=# of X's <= Yj + #of Y's <= Yj • So W2= Sum over all j's{R(Yj)} =sum over all j's{# of X's <= Yj + #of Y's <= Yj } =U + (1+2+3+…+n) , since we may assume the Y's are ordered =U+n(n+1)/2 • NOTE: In R, the W given in wilcox.test is = U (or m*n - U ) • If there are ties, then U=#pairs of X<=Y + 1/2*(#ties)
Review the relationship between a confidence interval and hypothesis testing in the parametric case… • If the null hypothesis of F1=F2 is rejected then one possible alternative is the so-called shift alternative: F1(x) = F2(x-D) where D can be thought of as the difference in means (or medians)… do a sketch. • A consequence of the truth of the shift alternative is that X and Y+ D have the same distribution. So think of D as the difference of the medians - we'll use the Hodges-Lehmann estimate of D : H-L=median(all pwds) • The 95% confidence interval for D is found by: • get all the pwds of X-Y • arrange them from smallest to largest • find 2 numbers ka, and kb s.t. P(ka<= U < kb ) = P(pwds(ka) < D <= pwds(kb)) = as close to the level of confidence as possible • for large samples, we'll use the normal approximation (more on this later) • Now go over Example 2.6.2 on page 46ff - use the R code in R4.doc… • Redo problem #4 on page 73 - get a 95% CI for D and its H-L estimate. Use R.