1.46k likes | 1.73k Views
Chapter. 15. Nonparametric Statistics. Section. 15.1. An Overview of Nonparametric Statistics. Objective. Understand the difference between parametric statistical procedures and nonparametric statistical procedures.
E N D
Chapter 15 Nonparametric Statistics
Section 15.1 An Overview of Nonparametric Statistics
Objective • Understand the difference between parametric statistical procedures and nonparametric statistical procedures
Parametric statistical procedures are inferential procedures conducted under the assumption that the underlying distribution of the data belongs to some parametric family of distributions (such as the normal distribution).
Nonparametric statistical procedures are inferential procedures that do not make any assumptions about the underlying distribution of the data. They do not require that the population belong to any particular parametric family of distributions (such as the normal distribution) and, therefore, are often referred to as distribution-free procedures.
Advantages of Nonparametric Statistical Procedures • Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly.
Advantages of Nonparametric Statistical Procedures • Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly. • For some nonparametric procedures, the computations are fairly easy.
Advantages of Nonparametric Statistical Procedures • Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly. • For some nonparametric procedures, the computations are fairly easy. • The procedures can be used for count data or rank data, so nonparametric methods can be used on data, such as the rankings of a movie as excellent, good, fair, or poor.
Disadvantages of Nonparametric Statistical Procedures • Nonparametric procedures are less efficient than parametric procedures. This means that a larger sample size is required when conducting a nonparametric procedure to have the same probability of a Type I error as the equivalent parametric procedure.
Disadvantages of Nonparametric Statistical Procedures • Nonparametric procedures often discard useful information. For example, the sign test uses only the sign of the data and rank tests merely preserve order-the magnitude of the actual data values is lost. As a result, nonparametric procedures are typically less powerful. Recall that the power of a test refers to the probability of making a Type II error. A Type II error occurs when a researcher does not reject the null hypothesis when the alternative hypothesis is true.
Disadvantages of Nonparametric Statistical Procedures • Because fewer requirements must be satisfied to conduct these tests, researchers sometimes use these procedures when parametric procedures can be used.
“In Other Words” The lower the efficiency is, the larger the sample size must be for a nonparametric test to have the probability of a Type I error the same as it would be for its equivalent parametric test.
Section 15.2 Runs Test for Randomness
Objective • Perform a runs test for randomness
A runs test for randomness is used to test whether data have been obtained or occur randomly. A run is a sequence of similar events, items, or symbols that is followed by an event, item, or symbol that is mutually exclusive from the first event, item, or symbol. The number of events, items, or symbols in a run is called its length.
CAUTION! Runs tests are used to test whether it is reasonable to conclude that data occur randomly, not whether the data are collected randomly. For example, we might wonder whether defective parts come off an assembly line randomly or systematically. If broken parts occur systematically (such as every fourth part), we might be led to believe that we have a broken machine. We don’t collect the data randomly; instead, we select 100 consecutive parts. We want to know whether the defective parts in the 100 selected occur randomly.
Notation Used in Conducting a Runs Test for Randomness • Let n represent the sample size of which there are two mutually exclusive types. • Let n1 represent the number of observations of the first type. • Let n2 represent the number of observations of the second type. • Let r represent the number of runs.
Parallel Example 1: Notation in a Runs Test for Randomness The following data represent the league that won the World Series for the years 1996-2007. Let “AL” represent the American League and “NL” represent the National League. AL NL AL AL AL NL AL NL AL AL NL AL Identify the values of n, n1, n2 and r.
Solution Let n represent the number of World Series in the sample. Let n1 represent the number of World Series won by the American League and n2 the number of World Series won by the National League. Lastly, let r represent the number of runs. Then, there are n =12 World Series in the sample, n1 = 8 World Series won by the American League, n2 =4 World Series won by the National League and r =9 runs.
Test Statistic for a Runs Test for Randomness Small-Sample Case: If n1≤20 and n2≤20, the test statistic in the runs test for randomness is r, the number of runs. Large-Sample Case: n1>20 or n2>20, the test statistic in the runs test for randomness is
Critical Values for a Runs Test for Randomness Small-Sample Case: To find the critical value at the = 0.05 level of significance for a runs test, we use Table X if n1≤20 and n2≤20. Large-Sample Case: If n1>20 or n2>20, the critical value is found from Table V, the standard normal table.
Parallel Example 2: Obtaining Critical Values from Table X Find the upper and lower critical values if n1=8 and n2=4.
Solution From Table X, the lower critical value is 3 and the upper critical value is 10.
Runs Test for Randomness To test the randomness of data, we can use the following steps, provided that the sample is a sequence of observations recorded in the order of their occurrence, and the observations can be categorized into two mutually exclusive categories.
Step 1: Assume the data are random. This forms the basis of the null and alternative hypotheses, which are structured as follows: H0: The sequence of data is random H1: The sequence of data is not random
Step 2: Determine a level of significance, , based on the seriousness of making a Type I error. The level of significance is used to determine the critical value. Note: For the small-sample case, we must use the level of significance =0.05.
Step 3: Use the number of runs, r, to compute the test statistic.
Parallel Example 3: Testing for Randomness (Small-Sample Case) The following data represent the league that won the World Series for the years 1996-2007. Let “AL” represent the American League and “NL” represent the National League. AL NL AL AL AL NL AL NL AL AL NL AL Test the claim that leagues win the World Series in a non-random way at the = 0.05 level of significance.
Solution The sample is a sequence of observations (which league won the World Series in a particular year) recorded in the order of occurrence. The observations are in two mutually exclusive categories, American League or National League. The requirements for the test are satisfied.
Solution Step 1: We are testing the hypothesis that the sequence of observations is random. Thus, H0: The sequence of data is random H1: The sequence of data is not random Step 2: The level of significance is = 0.05. The lower critical value is 3 and the upper critical value is 10 (Parallel Example 2).
Solution Step 3: The test statistic is r = 9 (Parallel Example 1). Step 4: Since the test statistic is between the lower and upper critical values, we do not reject the null hypothesis. Step 5: There is insufficient evidence to conclude that the World Series were won by the two leagues in a nonrandom way during the years 1996-2007.
Section 15.3 Inferences About Measures of Central Tendency
Objective • Conduct a one-sample sign test
A one-sample sign test is a nonparametric test that uses data, converted to plus and minus signs, to test a hypothesis regarding the median of a population. Data values equal to the assumed value of the median are ignored during the test.
Test Statistic for a One-Sample Sign Test The test statistic will depend on the structure of the hypothesis test and on the sample size. Small-Sample Case: (n ≤ 25)
Large-Sample Case: (n > 25) The test statistic, z, is where n is the number of minus and plus signs and k is obtained as described in the small-sample case.
Critical Values for a One-Sample Sign Test Small-Sample Case: To find the critical value for a one-sample sign test, we use Table XI if n ≤ 25. Large-Sample Case: If n >25, the critical value is found from Table V, the standard normal table. The critical value is always located in the left tail of the standard normal distribution. For a two-tailed test, the critical value is . For a left-tailed or right-tailed test, the critical value is .
One-Sample Sign Test To test hypotheses regarding the median of a population, we use the following steps, provided that the sample is a random sample. Step 1: Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways: Note:M0 is the assumed value of the median.
Step 2: Count the number of observations below M0, and assign them minus (-) signs. Count the number of observations above M0, and assign them plus (+) signs.
Step 3: Select a level of significance, , based on the seriousness of making a Type I error. The level of significance is used to determine the critical value. The critical value for small samples (n ≤ 25) is found from Table XI. The critical value for large samples (n > 25) is found from Table V.
Step 4: Obtain the test statistic, k. Note that k is the smaller of the number of minus signs and plus signs in the two-tailed test, that k is the number of plus signs in the left-tailed test, and that k is the number of minus signs in the right tailed test. In addition, n is the total number of plus and minus signs.
Parallel Example 1: Conducting a One-Sample Sign Test (Small-Sample Case) According to the United States Bureau of Labor Statistics, in 2000, the median tenure of employees with their current employer is 3.5 years. An economist believes that the median has increased since then. To test this claim, he randomly selects 16 employed individuals, determines their length of employment and obtains the following data. 0.3 0.8 0.7 3.2 10.3 1.4 0.2 0.9 3.6 6.3 11.2 12.8 7.3 13.0 3.8 23.6 Test the claim at the =0.05 level of significance.
Solution The data were obtained from a random sample so the conditions of the test are met. Step 1: We want to know if the median tenure of employees with their current employer is greater than 3.5 years. This is a right-tailed test. H0: M=3.5 versus H1: M > 3.5
Solution Step 2: There are 7 observations less than 3.5 and 9 observations greater than 3.5. Thus, we have 7 minus signs and 9 plus signs with n=16. Step 3: Because this is a right-tailed test and n ≤ 25, we find the critical value at the = 0.05 level of significance with n=16 to be 4 (see Table XI). Step 4: The test statistic is the number of minus signs. Thus, k =7.
Solution Step 5: Since the test statistic is greater than the critical value, 4, we do not reject the null hypothesis. Step 6: There is insufficient evidence to support the hypothesis that the median tenure of employees with their employer is greater than 3.5 years.