E N D
Assume as previously that we have k samples on as many treatments. We’ll let Rij denote the rank of the sample value Xij and as before N = the total number of sample values (see the Table 3.2.1 on page 86 for the complete notations…). Then the Kruskal – Wallis statistic is the rank-based equivalent to the F-statistic given by the formula • Note that the sum in the KW formula is actually the treatment sum of squares (remember, (N+1)/2 is the mean of all the N ranks). The constant coefficient of the sum is a “scaling factor” which makes the KW statistic have approximately a chi-square distribution with k-1 degrees of freedom. Thus p-values may be obtained from the chi-square tables, for large enough sample size; or we may use Table A6 in the Appendix (for small sample sizes and for k=3 or 4; or we may use a permutation test based on this KW statistic…
The permutation test based on the KW statistic is done in a similar manner to the others we’ve done except we’ll have to compute KW after each “shuffle” (sample)… try it now on Example 3.2.1 on page 87… see R#7 handout. • For SAS, use PROC NPAR1WAY WILCOXON; but be careful about using the EXACT WILCOXON; statement in the k-sample case – it can take several minutes to actually compute the exact probabilities… Try this on the data from Table 3.2.2 on page 87 • In the case of ties in the data, use mid-ranks to compute the ranks and make one of two adjustments (see p. 88 and 89):
It is also possible to create a “KW-like” statistic for general scores (not just ranks or mid-ranks) such as van der Waerden scores. See the statistic GS on page 91 and go over it carefully… • HW: Finish reading this section 3.2; make sure you can calculate the KW statistic in R and SAS and understand the output. • Midterm HW: Apply the Kruskal-Wallis permutation test to the data in problem #2 on page 105.
/*use the following to calculate the tied KW statistics*/ /*note that proc npar1way does the tied KW on p.88*/ dm log 'clear'; dm output 'clear'; options ls=80; data table3_2_3; input food_group $ salt_score @@; datalines; pr1 4 pr1 5 pr1 3 pr1 4 pr1 5 pr1 5 pr1 2 pr2 3 pr2 4 pr2 5 pr2 2 pr2 3 pr2 1 pr2 1 pr2 2 pr3 2 pr3 1 pr3 1 pr3 2 pr3 1 pr3 3 ; proc print; run; proc sort; by food_group; run; proc rank; run; *to get the mid-ranks; proc means; by food_group; run; *to get the means of the ranks for each group; proc means; run; *to get the mean and s.d. for all the ranks combined; proc npar1way wilcoxon data=table3_2_3; class food_group; var salt_score; run; quit;