450 likes | 603 Views
Mata kuliah : A0392 - Statistik Ekonomi Tahun : 2010. Pertemuan 11 Uji kebaikan Suai dan Uji Independen. Outline Materi : Uji Kebaikan Suai Uji Kesamaan Beberapa Proporsi Uji Independen Dua Faktor Kualitatif. Multinomial Experiments and Contingency Tables. 10-1 Overview
E N D
Mata kuliah : A0392 - Statistik Ekonomi Tahun : 2010 Pertemuan 11 Uji kebaikan Suai dan Uji Independen
Outline Materi : • Uji Kebaikan Suai • Uji Kesamaan Beberapa Proporsi • Uji Independen Dua Faktor Kualitatif
Multinomial Experiments and Contingency Tables 10-1 Overview 10-2 Multinomial Experiments:Goodness-of-fit 10-3 Contingency Tables:Independence and Homogeneity
Overview • We focus on analysis of categorical (qualitative or attribute) data that can be separated into different categories (often called cells). • Use the 2 (chi-square) test statistic (Table A-4). • The goodness-of-fit test uses a one-way frequency table (single row or column). • The contingency table uses a two-way frequency table (two or more rows and columns).
Definition Multinomial Experiment This is an experiment that meets the following conditions: 1. The number of trials is fixed. 2. The trials are independent. 3. All outcomes of each trial must be classified into exactly one of several different categories. 4. The probabilities for the different categories remain constant for each trial.
Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Verify that the four conditions of a multinomial experiment are satisfied.
Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Verify that the four conditions of a multinomial experiment are satisfied. 1. The number of trials (last digits) is the fixed number 73. 2. The trials are independent, because the last digit of the length of a home run does not affect the last digit of the length of any other home run. 3. Each outcome (last digit) is classified into exactly 1 of 10 different categories. The categories are 0, 1, … , 9. 4. Finally, if we assume that the home run distances are measured, the last digits should be equally likely, so that each possible digit has a probability of 1/10.
Definition Goodness-of-fit test A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution.
Goodness-of-Fit Test Notation 0 represents the observed frequency of an outcome E represents the expected frequency of an outcome krepresents the number of different categories or outcomes nrepresents the total number of trials
n E = k Expected Frequencies If all expected frequencies are equal: the sum of all observed frequencies divided by the number of categories
Expected Frequencies If all expected frequencies are not all equal: each expected frequency is found by multiplying the sum of all observed frequencies by the probability for the category E = np
2= (O – E)2 E Goodness-of-fit Test in Multinomial ExperimentsTest Statistic Critical Values 1. Found in Table A-4 using k – 1 degrees of freedom where k =number of categories 2. Goodness-of-fit hypothesis tests are always right-tailed.
A close agreement between observed and expected values will lead to a small value of 2 and a large P-value. • A large disagreement between observed and expected values will lead to a large value of 2 and a small P-value. • A significantly large value of 2 will cause a rejection of the null hypothesis of no difference between the observed and the expected.
Figure 10-3 Relationships Among Components in Goodness-of-Fit Hypothesis Test
Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency. H0: p0 = p1 = = p9 H1: At least one of the probabilities is different from the others. = 0.05 k – 1 = 9 2.05,9 = 16.919
Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency.
Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency. The test statistic is 2 = 251.521. Since the critical value is 16.919, we reject the null hypothesis. There is sufficient evidence to support the claim that the last digits do not occur with the same relative frequency.
Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency.
Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 10-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks. H0: p1 = 0.301, p2 = 0.176, p3 = 0.125, p4 = 0.097, p5 = 0.079, p6 = 0.067, p7 = 0.058, p8 = 0.051 and p9 = 0.046 H1: At least one of the proportions is different from the claimed values. = 0.01 k – 1 =8 2.01,8 = 20.090
Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 10-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks.
Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 10-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks. The test statistic is 2 = 3650.251. Since the critical value is 20.090, we reject the null hypothesis. There is sufficient evidence to reject the null hypothesis.
Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 10-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks.
Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 10-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks.
Definition • Contingency Table (or two-way frequency table) A contingency table is a table in which frequencies correspond to two variables. (One variable is used to categorize rows, and a second variable is used to categorize columns.) Contingency tables have at least two rows and at least two columns.
Definition • Test of Independence This method tests the null hypothesis that the row variable and column variable in a contingency table are not related. (The null hypothesis is the statement that the row and column variables are independent.)
Assumptions 1. The sample data are randomly selected. 2. The null hypothesis H0 is the statement that the row and column variables are independent; the alternative hypothesis H1 is the statement that the row and column variables are dependent. 3. For every cell in the contingency table, the expected frequency E is at least 5. (There is no requirement that every observed frequency must be at least 5.)
2= (O – E)2 E Test of IndependenceTest Statistic Critical Values 1. Found in Table A-4 using degrees of freedom = (r – 1)(c – 1) r is the number of rows and c is the number of columns 2. Tests of Independence are always right-tailed.
E = • (row total) (column total) • (grand total) • Total number of all observed frequencies in the table
Tests of Independence H0: The row variable is independent of the column variable H1: The row variable is dependent (related to) the column variable This procedure cannot be used to establish a direct cause-and-effect link between variables in question. Dependence means only there is a relationship between the two variables.
row total column total E= • • • grand total • grand total • grand total n • p • (probability of a cell) E= • (row total) (column total) • (grand total) Expected Frequency for Contingency Tables
Observed and Expected Frequencies Women Girls Total Men Boys 29 35 64 318 104 422 27 18 45 706 1517 2223 332 1360 1692 Survived Died Total We will use the mortality table from the Titanic to find expected frequencies. For the upper left hand cell, we find: (706)(1692) E = = 537.360 2223
Observed and Expected Frequencies Women Girls Total Men Boys 29 35 64 318 104 422 27 18 45 706 1517 2223 332 537.360 1360 1692 Survived Died Total Find the expected frequency for the lower left hand cell, assuming independence between the row variable and the column variable. (1517)(1692) E = = 1154.640 2223
Observed and Expected Frequencies Women Girls Total Men Boys 29 20.326 35 43.674 64 318 134.022 104 287.978 422 27 14.291 18 30.709 45 706 1517 2223 332 537.360 1360 1154.64 1692 Survived Died Total To interpret this result for the lower left hand cell, we can say that although 1360 men actually died, we would have expected 1154.64 men to die if survivablility is independent of whether the person is a man, woman, boy, or girl.
Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl. H0: Whether a person survived is independent of whether the person is a man, woman, boy, or girl. H1: Surviving the Titanic and being a man, woman, boy, or girl are dependent.
2= (332–537.36)2 + (318–132.022)2 + (29–20.326)2 + (27–14.291)2 14.291 537.36 134.022 20.326 + (1360–1154.64)2 + (104–287.978)2 + (35–43.674)2 + (18–30.709)2 30.709 1154.64 43.674 287.978 2=78.481 + 252.555 + 3.702+11.302+36.525+117.536+1.723+5.260 = 507.084 Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl.
The number of degrees of freedom are (r–1)(c–1)= (2–1)(4–1)=3. 2.05,3 = 7.815. We reject the null hypothesis. Survival and gender are dependent. Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl.
2=507.084 Test Statistic 2=7.815 (from Table A-4) Critical Value with = 0.05 and (r – 1) (c– 1) = (2 – 1) (4 – 1) = 3degrees of freedom
Relationships Among Components in X2 Test of Independence Figure 10-8
Definition • Test of Homogeneity In a test of homogeneity, we test the claim that different populations have the same proportions of some characteristics.
How to distinguish between a test of homogeneity and a test for independence: Were predetermined sample sizes used for different populations (test of homogeneity), or was one big sample drawn so both row and column totals were determined randomly (test of independence)?
Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men.
Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men. H0: The proportions of agree/disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women. H1: The proportions are different.
Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men.
Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men. The Minitab display includes the test statistic of 2 = 6.529 and a P-value of 0.011. Using the P-value approach, we reject the null hypothesis of equal(homogeneous) proportions(because the P-value of 0.011 is less than 0.05. There is sufficient evidence to reject the claim of equal proportions.