290 likes | 307 Views
If we can reduce our desire, then all worries that bother us will disappear. Statistical Package Usage. Topic: Basic Categorical Data Analysis By Prof Kelly Fan, Cal. State Univ., East Bay. Outline. Only categorical variables are discussed here. Verify the hypothesized distribution
E N D
If we can reduce our desire, then all worries that bother us will disappear.
Statistical Package Usage Topic: Basic Categorical Data Analysis By Prof Kelly Fan, Cal. State Univ., East Bay
Outline Only categorical variables are discussed here. • Verify the hypothesized distribution • One-sample Chi-square test • Test the independence between two categorical variables • Chi-square test for two-way contingency table • McNemar’s test for paired data • Measure the dependence (Phil and Kappa Coefficients) • Odds ratios and relative risk • Test the trend of a binary response • Chi-square test for trend • Meta-analysis
Example: Hair Color Distribution From a random sample of 246 children
One-sample Chi-Square Test • Must be a random sample • The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories
One-sample Chi-Square Test • Test statistic: Oi = the observed frequency of i-th category ei = the expected frequency of i-th category
Two-way Contingency Tables • Report frequencies on two variables • Such tables are also called crosstabs.
Contingency Tables (Crosstabs) 1991 General Social Survey
Crosstabs Analysis (SAS: p.88-90; SPSS: p.369-371) • Chi-square test for testing the independence between two variables: • For a fixed column, the distribution of frequencies over rows keeps the same regardless of the column • For a fixed row, the distribution of frequencies over columns keeps the same regardless of the row
Crosstabs Analysis • The phi coefficient measures the association between two categorical variables • -1 < phi < 1 • | phi | indicates the strength of the association • If the two variables are both ordinal, then the sign of phi indicate the direction of association
SAS Output Statistic DF Value Prob Chi-Square 2 79.4310 <.0001 Likelihood Ratio Chi-Square 2 90.3311 <.0001 Mantel-Haenszel Chi-Square 1 79.3336 <.0001 Phi Coefficient 0.2847 Contingency Coefficient 0.2738 Cramer's V 0.2847 Sample Size = 980
Fisher’s Exact Test for Independence • The Chi-squared tests are for large samples • The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories
SAS Output Fisher's Exact Test Table Probability (P) 3.823E-22 Pr <= P 2.787E-20 Sample Size = 980
Matched-pair Data • Comparing categorical responses for two “paired” samples When either • Each sample has the same subjects (or say subjects are measured twice) Or • A natural pairing exists between each subject in one sample and a subject form the other sample (eg. Twins)
Marginal Homogeneity • The probabilities of “success” for both samples are identical • Eg. The probability of approve at the first and 2nd surveys are identical
McNemar Test (for 2x2 Tables only) • See SAS textbook Section 3.L • Ho: marginal homogeneity Ha: no marginal homogeneity • Exact p-value • Approximate p-value (When n12+n21>10)
SAS Output McNemar's Test Statistic (S) 17.3559 DF 1 Asymptotic Pr > S <.0001 Exact Pr >= S 3.716E-05 Simple Kappa Coefficient Kappa 0.6996 ASE 0.0180 95% Lower Conf Limit 0.6644 95% Upper Conf Limit 0.7348 Sample Size = 1600 Level of agreement
Comparing Proportions in 2x2 Tables • Difference of proportions: pi1-pi2 • Relative risk: pi1/pi2 • Odds Ratio: odds1/odds2 odds1=pi1/(1-pi1) odds2=pi2/(1-pi2)
Example: Aspirin vs. Heart Attack • Prospective sampling; Row totals were fixed
Chi-square Test for Trend Situation: A binary response (success, failure) + an ordinal explanatory variable Question: Is there a trend? Are the proportions (of success) in each of the levels of the explanatory variable increasing or decreasing in a linear fashion?
Example: Shoulder Harness Usage Question: Is the proportion of shoulder harness usage increasing or decreasing linearly as the car size gets larger?
SAS Output Statistics for Table of response by car_size Statistic DF Value Prob Chi-Square 2 0.6080 0.7379 Likelihood Ratio Chi-Square 2 0.6092 0.7374 Mantel-Haenszel Chi-Square 1 0.3073 0.5793 Phi Coefficient 0.0277 Contingency Coefficient 0.0277 Cramer's V 0.0277
Meta Analysis • Also known as Mantel-Haenszel test; stratified analysis Situation: When another variable (strata) may “pollute” the effect of a categorical explanatory variable on a categorical response Goal: Study the effect of the explanatory while controlling the stratification variable
SAS Output Statistics for Table 1 of trtmnt by response Controlling for center=1 Statistic DF Value Prob Chi-Square 1 10.0198 0.0015 Likelihood Ratio Chi-Square 1 10.2162 0.0014 Continuity Adj. Chi-Square 1 8.7284 0.0031 Mantel-Haenszel Chi-Square 1 9.9085 0.0016 Phi Coefficient 0.3337 Contingency Coefficient 0.3165 Cramer's V 0.3337 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits Case-Control (Odds Ratio) 4.0134 1.6680 9.6564 Cohort (Col1 Risk) 2.0714 1.2742 3.3675 Cohort (Col2 Risk) 0.5161 0.3325 0.8011 Sample Size = 90
SAS Output Summary Statistics for trtmnt by response Controlling for center Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 18.4106 <.0001 2 Row Mean Scores Differ 1 18.4106 <.0001 3 General Association 1 18.4106 <.0001
SAS Output Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 4.0288 2.1057 7.7084 (Odds Ratio) Logit 4.0286 2.1057 7.7072 Cohort Mantel-Haenszel 1.7368 1.3301 2.2680 (Col1 Risk) Logit 1.6760 1.2943 2.1703 Cohort Mantel-Haenszel 0.4615 0.3162 0.6737 (Col2 Risk) Logit 0.4738 0.3264 0.6877 Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 0.0002 DF 1 Pr > ChiSq 0.9900 Total Sample Size = 180