Statistical Analysis

Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #2 Chi-square Tests for Homogeneity, Chi-square Goodness of Fit Test,

Chi-square Tests • Tests for independence in contingency tables • Tests for homogeneity

Binomial Samples(Product Binomial Sampling) Genetic Theory: Ho: pW = 0.5 vs. Ha: pW 0.5 • Hypothesis #1: Is pw = 0.5? • Binomial inference on p • Equivalently, overall goodness of fit (known p) • Hypothesis #2: Are all the pw equal? • Test for homogeneity (equal but unknown p) • Hypothesis #3: Is eachpw = 0.5? • Goodness of fit (8 samples, known p) Assumptions: 8 samples, mutually independent counts

Does not assume homogeneity (see below) Test of Homogeneity of k Binomial Samples, Specified p Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj 0.5 for some j X2 = 22.96 , df = 8 , p = 0.003

Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pjpk for some (j,k)

Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pjpk for some (j,k) X2 = 20.43 , df = 7 , p = 0.005 Note: Only one of each pair of expected values is independently estimated (k = 8, not 16)

Chi-square Tests • Tests for independence in contingency tables • Tests for homogeneity • Goodness of fit tests

Chi-square Goodness of Fit Test:Specified Probabilities Assumptions • n independent observations • k mutually exclusive possible outcomes • pj = Pr(outcome j) is the same on every trial Sample size condition All npj 1 At least 80% of the npj 5

Ho: Pr(outcome j) = pj for j = 1 , ... , k Ha: Pr(outcome j) pj for at least one j Reject Ho if X2 > Xa2 Xa2 = Chi-Square df = k - 1 Goodness of Fit Test:Specified Probabilities Sample size: n Observed count for outcome j : Oj Expected count for outcome j : Ej = npj

p = 0.026 Sufficient Evidence of Cognitive Learning ? Cognitive Learning Path Chosen A B C D Total Number of rats 4 5 8 15 32 Expected number 8 8 8 8 32 Using a significance level of a = 0.05, there is sufficient evidence (p = 0.026) to reject the hypothesis that rats choose the 4 doors with equal probability.

Mendelian Inheritance Do the genotypes of a cross-breeding occur in the ratio 9:3:3:1 ? Reject Ho if X2 > 7.815 (a = 0.05)

Mendelian Inheritance 0.25 0.08 1.33 1.00 X2 = 0.25 + 0.08 + 1.33 + 1.00 = 2.66 There is insufficient evidence (p > 0.10) at a significance level of 0.05 to conclude that the genotypes from this type of cross-breeding occur in proportions that differ from those predicted by Mendelian inheritance theory.

Chi-Square Goodness of Fit Test:Unknown Parameters • Estimate the parameters of the distribution • Divide range of data values into mutually exclusive and exhaustive classes • Discrete data: often use the values themselves • Continuous data: use k = n1/2 or k = log(n) classes • Estimate the probability of being in each class • Compare the observed (Oi) counts in each class with the estimated expected (Ei) counts

Chi-Square Goodness of Fit Test for the Poisson Distribution Number of senders (automated telephone equipment) in use at a given time 23 – 1 = 22 Categories H0: number ~ Poisson Ha: number not Poisson Reject if X > C20.05(20) = 31.4 df: 22 – 1 (mutually exclusive & exhaustive) – 1 (estimated parameter) = 20

Chi-Square Goodness of Fit Test for the Normal Distribution • Divide the data into mutually exclusive and exhaustive (contiguous) classes • First and last classes are open-ended • ( , U1), (L2,U2), (L3, U3) … (Lk, ) with Lj = Uj-1 • Estimate the mean and standard deviation • Calculate z-scores for the limits of each class • Estimate the Probability Content for Each Class • pj = Pr(zLj < z < zUj) • Estimate the Expected Frequency for Each Class • Ej = npj

Chi-Square Goodness of Fit Test • Can be applied to any discrete or continuous probability distribution, only probabilities need be specified: Ei = npi • Asymptotic chi-square distribution • All Ei > 1 & at Least 80% of the Ei > 5 • Does not have the highest power for specific distributions, against specific alternatives • Degrees of freedom (k classes) • If each class represents an independent sample (i.e, k replicate samples) and all parameters are known (i.e., known probabilities), df = k • If the classes represent mutually exclusive and exhaustive categories (i.e., expected frequencies must sum to n), data are independent and from a single sample • All parameters are known, df = k – 1 • r parameters are estimated: df = k – r – 1 • e.g., (n – 1)s2/s2 ~ C2(n – 1)

Goodness of Fit to the Binomial,Known p • Normal theory approximation • Chi-square tests

Binomial Sample, Specified p:Normal Theory Approximation Genetic Theory: Ho: pW = 0.5 vs. Ha: pW 0.5 Greater Power by Combining Samples (Assuming Homogeneity) p = 0.110

Alternative to the Binomial Test: Chi-square Goodness of Fit, Specified p Genetic Theory: Ho: pW = 0.5 vs. Ha: pW 0.5 p = 0.110

Greater Power if Homogeneous X2 = 2.56 , df = 1 , p = 0.110 Greater Power if Not Homogeneous X2 = 22.96 , df = 8 , p = 0.003 Overall Binomial Test vs.Test of Homogeneity, Specified p Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj 0.5 for some j

pw unspecified Binomial Samples Homogeneity, unspecified pequivalent toindependence

Some Goodness of Fit Tests • Chi-square Goodness-of-fit test • Very general, can have little power • Kolmogorov-Smirnov goodness-of-fit test • Good general test, especially for continuous random variables • Wilk-Shapiro test for normality • Regarded as the best test for normality

Comparing Odds Ratios Across Categories

Race and Death Penalty Punishment Are the results consistent across aggravation levels ?

Mantel-Haenszel Test • Several 2 x 2 tables • Assuming a common odds ratio, test that the odds ratio = 1

Race and Death Penalty Punishment Expected frequencies for chi-square test of independence Note: None have sufficient sample sizes for tests of independence

Mantel-Haenszel Test • Select one cell; e.g., upper-left • Calculate the excess for each table • Excess = Observed – Expected • e.g., Excess = O11 – E11 • Calculate the variances of the excesses • Variance = R1R2C1C2/n2(n-1)

Race and Death Penalty Punishment Conclusion: Nearly 7 more white-victim murderers received the death penalty than would be expected if the odds were the same for white- and black-victim murderers

Estimating the Common Odds Ratio Death Penalty and Race

Statistical Analysis