Nonparametric Test Procedures: Sign Test and Beyond

Lecture Slides Elementary StatisticsEleventh Edition and the Triola Statistics Series by Mario F. Triola

13-1 Review and Preview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks Test for Matched Pairs 13-4 Wilcoxon Rank-Sum Test for Two Independent Samples 13-5 Kruskal-Wallis Test 13-6 Rank Correction 13-7 Runs Test for Randomness Chapter 13Nonparametric Statistics

Section 13-1 Review and Preview

In the preceding chapters, we presented a variety of different methods of inferential statistics. Many of those methods require normally distributed populations and are based on sampling from a population with specific parameters, such as the mean , standard deviation , or population proportion p. Review

Definitions Parametric tests have requirements about the nature or shape of the populations involved. Nonparametric tests do not require that samples come from populations with normal distributions or have any other particular distributions. Consequently, nonparametric tests are called distribution-free tests. Preview

1. Nonparametric methods can be applied to a wide variety of situations because they do not have the more rigid requirements of the corresponding parametric methods. In particular, nonparametric methods do not require normally distributed populations. 2. Unlike parametric methods, nonparametric methods can often be applied to categorical data, such as the genders of survey respondents. Advantages of Nonparametric Methods

1. Nonparametric methods tend to waste information because exact numerical data are often reduced to a qualitative form. 2. Nonparametric tests are not as efficient as parametric tests, so with a nonparametric test we generally need stronger evidence (such as a larger sample or greater differences) in order to reject a null hypothesis. Disadvantages of Nonparametric Methods

Efficiency of Nonparametric Methods

Data are sorted when they are arranged according to some criterion, such as smallest to the largest or best to worst. A rank is a number assigned to an individual sample item according to its order in the sorted list. The first item is assigned a rank of 1, the second is assigned a rank of 2, and so on. Definitions

Find the mean of the ranks involved and assign this mean rank to each of the tied items. Mean is 3. Mean is 7.5. Handling Ties in Ranks Sorted Data 4 5 5 5 10 11 12 12 Preliminary Ranking 1 2 3 4 5 6 7 8 Rank 1 3 3 3 5 6 7.5 7.5

Section 13-2 Sign Test

Key Concept The main objective of this section is to understand the sign test procedure, which involves converting data values to plus and minus signs, then testing for disproportionately more of either sign.

Definition The sign test is a nonparametric (distribution free) test that uses plus and minus signs to test different claims, including: 1) Claims involving matched pairs of sample data; 2) Claims involving nominal data; 3) Claims about the median of a single population.

Basic Concept of the Sign Test The basic idea underlying the sign test is to analyze the frequencies of the plus and minus signs to determine whether they are significantly different.

Figure 13-1Sign Test Procedure

Requirements The sample data have been randomly selected. Note: There is no requirement that the sample data come from a population with a particular distribution, such as a normal distribution.

Notation for Sign Test x = the number of times the less frequent sign occurs n = the total number of positive and negative signs combined

Test Statistic Forn 25:x(the number of times the less frequent sign occurs) n (x + 0.5) – 2 Forn > 25: z = n 2

Critical Values For n 25, critical x values are in Table A-7. For n > 25, critical z values are in Table A-2.

Caution When applying the sign test in a one-tailed test, we need to be very careful to avoid making the wrong conclusion when one sign occurs significantly more often than the other, but the sample data contradict the alternative hypothesis. See the following example.

Claims Involving Matched Pairs When using the sign test with data that are matched pairs, we convert the raw data to plus and minus signs as follows: Subtract each value of the second variable from the corresponding value of the first variable. Record only the sign of the difference found in step 1. Exclude ties: that is, any matched pairs in which both values are equal.

Key Concept Underlying This Use of the Sign Test If the two sets of data have equal medians, the number of positive signs should be approximately equal to the number of negative signs.

Example: Table 13-3 includes some of theweights listed in Data Set 3 in Appendix B. Those weights were measured from college students in September and April of their freshman year. Use the sample data inTable 13-3 with a 0.05 significance level to test the claim that there is no differencebetween the September weights and the April weights. Use the sign test.

Example: H0: The median of the differences is equal to 0. H1: The median of the differences is not equal to 0. = 0.05 (in two tails) x = minimum (7, 2) = 2 (From Table 13-3, there are 7 negative signs and 2 positive signs.) Critical value = 1 (From Table A-7 where n = 9 and  = 0.05)

Example: H0: The median of the differences is equal to 0. H1: The median of the differences is not equal to 0. With a test statistic of x = 4 and a critical value of 1, we fail to reject the null hypothesis of no difference. There is not sufficient evidence to warrant rejection of the claim that the median of the differences is equal to 0.

Example: We conclude that the September and April weights appearto be about the same. (If we use the parametric ttest for matched pairs (Section 9-4),we conclude that the mean difference is not zero, so the September weights andApril weights appear to be different.)The conclusion should be qualified with the limitations noted in the articleabout the study. Only Rutgers students were used, and study subjects were volunteersinstead of being a simple random sample.

Claims Involving Nominal Data The nature of nominal data limits the calculations that are possible, but we can identify the proportion of the sample data that belong to a particular category. Then we can test claims about the corresponding population proportion p.

Example: The Genetics and IVF Institute conducted a clinical trial of its methods for gender selection. As of this writing, 668 of 726 babies born to parents using the XSORT method of gender selection were girls. Use the sign test and a 0.05 significance level to test the claim that this method of gender selection is effective in increasing the likelihood of a baby girl. The procedures are for cases in which n > 25. Note that the only requirement is that the sample data are randomly selected. H0: p= 0.5 (the proportion of girls is 0.5) H1: p> 0.5 (girls are more likely)

Example: Denoting girls by the positive sign (+) and boys by the negative sign (–), we have 668 positive signs and 58 negative signs. Test statistic x = minimum(668, 58) = 58 Test whether 58 boys is low enough to be significant so it is a left-tailed test.

726 (58 + 0.5) – 2 = –22.60 z = 726 2 Example: Since n = 726 (> 25), the test statistic x = 58 is converted to the test statistic x as follows: n (x + 0.5) – 2 z = n 2

Example: With  = 0.05 in a left-tailed test, the critical value is z = – 1.645. The test statistic z = –22.60 is in the critical region bounded by z = –1.645.

Example: We reject the null hypothesis that the proportion of girls is equal to 0.5. There is sufficient evidence to support the claim that girls are more likely with the XSORT method. The XSORT method of gender selection does appear to be effective in increasing the likelihood of a girl.

Claims About the Median of a Single Population The negative and positive signs are based on the claimed value of the median.

Example: Body Temperature Data Set 2 in Appendix B includes measured body temperatures of adults. Use the 106 temperatures listed for 12 A.M. on Day 2 with the sign test to test the claim that the median is less than 98.6ºF. Of the 106 subjects, 68 had temperatures below 98.6ºF, 23 had temperatures above 98.6ºF, and 15 had temperatures equal to 98.6ºF. H0: Median is equal to 98.6°F. H1: Median is less than 98.6°F. Since the claim is that the median isless than98.6°F, the test involves only the left tail.

Example: Body Temperature Discard the 15 zeros. Use ( – ) to denote the 68 temperatures below 98.6°F, and use ( + ) to denote the 23 temperatures above 98.6°F. So n = 91 and x = 23

n (x + 0.5) – 2 z = n 2 91 (23 + 0.5) – 2 = – 4.61 z = 91 2 Example: Body Temperature

Example: Body Temperature We use Table A-2 to get the critical z value of –1.645. The test statistic of z = –4.61 falls into the critical region. We reject the null hypothesis. We support the claim that the median body temperature of healthy adults is less than 98.6°F.

Example: Body Temperature

Recap • In this section we have discussed: Sign tests where data are assigned plus or minus signs and then tested to see if the number of plus and minus signs is equal. Sign tests can be performed on claims involving: Matched pairs Nominal data The median of a single population

Section 13-3 Wilcoxon Signed-Ranks Test for Matched Pairs

Key Concept The Wilcoxon signed-ranks test involves the conversion of the sample data ranks. This test can be used for two different applications.

The Wilcoxon signed-ranks test is a nonparametric test that uses ranks for these applications: Definition 1. Test a null hypothesis that the population of matched pairs has differences with a median equal to zero. 2. Test a null hypothesis that a single population has a claimed value of the median.

Use the Wilcoxon signed-ranks testwith matched pairs for the following null and alternative hypotheses: Objective H0: The matched pairs have differences that come from a population with a median equal to zero. H1: The matched pairs have differences that come from a population with a nonzero median.

1. The data consist of matched pairs that have been randomly selected. 2. The population of differences (found from the pairs of data) has a distribution that is approximately symmetric, meaning that the left half of its histogram is roughly a mirror image of its right half. (There is no requirement that the data have a normal distribution.) Wilcoxon Signed-Ranks TestRequirements

Notation T= the smaller of the following two sums: 1. The sum of the positive ranks of the nonzero differences d 2. The absolute value of the sum of the negative ranks of the nonzero differences d

Test Statistic for the Wilcoxon Signed-Ranks Test for Matched Pairs Forn30, the test statistic is T. T – n(n + 1) 4 n(n +1) (2n +1) 24 Forn > 30,the test statistic is z =

Critical Values for the Wilcoxon Signed-Ranks Test for Matched Pairs For n 30, the critical T value is found in Table A-8. For n > 30, the critical z values are found in Table A-2.

Procedure for Finding the Value of the Test Statistic Step 1: For each pair of data, find the difference d by subtracting the second value from the first. Keep the signs, but discard any pairs for which d = 0. Step 2: Ignore the signs of the differences, then sort the differences from lowest to highest and replace the differences by the corresponding rank value. When differences have the same numerical value, assign to them the mean of the ranks involved in the tie.

Nonparametric Test Procedures: Sign Test and Beyond

Nonparametric Test Procedures: Sign Test and Beyond

Presentation Transcript

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides

Lecture Slides