260 likes | 443 Views
Statistical Analysis. Professor Lynne Stokes Department of Statistical Science Lecture #1 Chi-square Contingency Table Test. Independence. Employment Status is independent of Age. Note: One population, responses formed by two categorizations. Homogeneity.
E N D
Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #1 Chi-square Contingency Table Test
Independence Employment Status is independent of Age Note: One population, responses formed by two categorizations
Homogeneity If nondiscriminatory, promotions are binomially distributed with a common p for both gender categories Note: Two populations, common distribution of responses
Cognitive Learning in Rats -- Tolman, Ritchie, Kalish (1946) Barrier Prior Theory: Discrete Learning Steps Goal -- Hull C D B Candidate Theory: Cognitive Learning -- Tolman A
Goodness of Fit Path Chosen A B C D Total Number of Rats 4 5 8 15 32 Evidence of cognitive learning ? If random selection, Multinomial with pj = 1/4
Compare Incidence of Death Penalty Are victim’s race and sentence independent?Is aggravation level an explanatory factor? Drunk, Lover’s Quarrel, Argument, etc. More Serious Vicious, Cold-blooded, Unprovoked, Murder, etc.
Chi-Square Tests for Count Data • Independence • Distribution of responses across one categorization is identical for each category of a second categorization • Homogeneity • Distribution of responses is identical across several categories of one categorical variable or across several independent samples • Goodness of Fit • Responses are consistent with a stated probability distribution • Parameters specified • Unknown parameter values
Chi-square Tests • Tests for independence in contingency tables
Are the two categorizations statistically independent? e.g., Is employment status statistically independent of age? Contingency Tables(Crosstabs) • Two categorizations (rows and columns) • Each with mutually exclusive categories • Sample of n independent observations Note: Equivalent to Homogeneity Test, Unspecified p, When Only 2 Rows
Notation for Observed Frequencies Column Categories • 1 ... j ... c Total • 1 • ... • i Oij Row i Total • ... • r • Total Column n • j Total Row Categories (Ri) (Cj)
If row and column categories are independent, Reject Ho if X2 > Xa2 Xa2 = Chi-Square df = (r - 1)(c - 1) Chi-square Test for Independence Ho: Row and column categories are independent Ha: Row and column categories are not independent
df = c - 1 Row 1: df = c - 1 Row 2: df = c - 1 Row r-1: Row r: Estimated expected frequencies in column j sum to Cj Degrees of Freedom for Contingency Tables Given Row and Column Totals, df = (r – 1)(c – 1) . . .
Notational Convention: Eij Even Though Estimated Reject Ho if X2 > Xa2 Xa2 = Chi-Square df = (r - 1)(c - 1) Chi-square Contingency Table Test Summary
Expected Frequencies Chi-square Calculation Employment Discrimination Observed Frequencies
Age (yrs) Employment Status Age (yrs) Employment Status Employment Discrimination Are age and employment status related ?
Employment Discrimination Ho: Employment Status and Age are independent Ha: Employment Status and Age are not independent Reject Ho if X2 > 6.635 (a = 0.01, df = 1) X2 = 138.67 Conclusion: There is sufficient evidence (p < 0.001), using a significance level of 0.05, to conclude that employment status and age are not statistically independent. Reason: A greater number of older employees were terminated than expected under the hypothesis of independence.
Drug Usage Group Frequency of Drug Use Group Frequency of Drug Use
Drug Usage Observed Frequencies Expected Frequencies Chi-Square Calculation
Drug Usage Ho: Drug Usage and Campus Group are Independent Ha: Drug Usage and Campus Group are Not Independent Reject Ho if X2 > 5.991 (a = 0.05, df = 2) X2 = 6.87 Conclusion: Using a significance level of 0.05, there is sufficient evidence (0.025 < p < 0.05) to conclude that drug usage and campus group are not statistically independent. Reason: A greater number of athletes and fewer members of campus organizations reported monthly usage of drugs than expected under the hypothesis of independence.
Chi-square Tests • Tests for independence in contingency tables • Tests for homogeneity
Binomial Samples(Product Binomial Sampling) Genetic Theory:Ho: pW = 0.5 vs. Ha: pW 0.5 • Hypothesis #1: Is pw = 0.5? • Binomial inference on p • Equivalently, overall goodness of fit (known p) • Hypothesis #2: Are all the pw equal? • Test for homogeneity (equal but unknown p) • Hypothesis #3: Is eachpw = 0.5? • Goodness of fit (8 Samples, known p) Assumptions: 8 Samples, mutually independent counts
Does not assume homogeneity (see below) Test of Homogeneity of k Binomial Samples, Specified p Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj 0.5 for some j X2= 22.96 , df = 8 , p = 0.003
Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pjpk for some (j,k)
Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pjpk for some (j,k) X2 = 20.43 , df = 7 , p = 0.005 Note: Only one of each pair of expected vlues is independently estimated (k = 8, not 16)