1 / 29

If we can reduce our desire, then all worries that bother us will disappear.

If we can reduce our desire, then all worries that bother us will disappear. Statistical Package Usage. Topic: Basic Categorical Data Analysis By Prof Kelly Fan, Cal. State Univ., East Bay. Outline. Only categorical variables are discussed here. Verify the hypothesized distribution

hduke
Download Presentation

If we can reduce our desire, then all worries that bother us will disappear.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. If we can reduce our desire, then all worries that bother us will disappear.

  2. Statistical Package Usage Topic: Basic Categorical Data Analysis By Prof Kelly Fan, Cal. State Univ., East Bay

  3. Outline Only categorical variables are discussed here. • Verify the hypothesized distribution • One-sample Chi-square test • Test the independence between two categorical variables • Chi-square test for two-way contingency table • McNemar’s test for paired data • Measure the dependence (Phil and Kappa Coefficients) • Odds ratios and relative risk • Test the trend of a binary response • Chi-square test for trend • Meta-analysis

  4. Example: Hair Color Distribution From a random sample of 246 children

  5. One-sample Chi-Square Test • Must be a random sample • The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories

  6. One-sample Chi-Square Test • Test statistic: Oi = the observed frequency of i-th category ei = the expected frequency of i-th category

  7. SAS Output

  8. Two-way Contingency Tables • Report frequencies on two variables • Such tables are also called crosstabs.

  9. Contingency Tables (Crosstabs) 1991 General Social Survey

  10. Crosstabs Analysis (SAS: p.88-90; SPSS: p.369-371) • Chi-square test for testing the independence between two variables: • For a fixed column, the distribution of frequencies over rows keeps the same regardless of the column • For a fixed row, the distribution of frequencies over columns keeps the same regardless of the row

  11. Crosstabs Analysis • The phi coefficient measures the association between two categorical variables • -1 < phi < 1 • | phi | indicates the strength of the association • If the two variables are both ordinal, then the sign of phi indicate the direction of association

  12. SAS Output Statistic DF Value Prob Chi-Square 2 79.4310 <.0001 Likelihood Ratio Chi-Square 2 90.3311 <.0001 Mantel-Haenszel Chi-Square 1 79.3336 <.0001 Phi Coefficient 0.2847 Contingency Coefficient 0.2738 Cramer's V 0.2847 Sample Size = 980

  13. Fisher’s Exact Test for Independence • The Chi-squared tests are for large samples • The sample size must be large enough so that expected frequencies are greater than or equal to 5 for 80% or more of the categories

  14. SAS Output Fisher's Exact Test Table Probability (P) 3.823E-22 Pr <= P 2.787E-20 Sample Size = 980

  15. Matched-pair Data • Comparing categorical responses for two “paired” samples When either • Each sample has the same subjects (or say subjects are measured twice) Or • A natural pairing exists between each subject in one sample and a subject form the other sample (eg. Twins)

  16. Example: Rating for Prime Minister

  17. Marginal Homogeneity • The probabilities of “success” for both samples are identical • Eg. The probability of approve at the first and 2nd surveys are identical

  18. McNemar Test (for 2x2 Tables only) • See SAS textbook Section 3.L • Ho: marginal homogeneity Ha: no marginal homogeneity • Exact p-value • Approximate p-value (When n12+n21>10)

  19. SAS Output McNemar's Test Statistic (S) 17.3559 DF 1 Asymptotic Pr > S <.0001 Exact Pr >= S 3.716E-05 Simple Kappa Coefficient Kappa 0.6996 ASE 0.0180 95% Lower Conf Limit 0.6644 95% Upper Conf Limit 0.7348 Sample Size = 1600 Level of agreement

  20. Comparing Proportions in 2x2 Tables • Difference of proportions: pi1-pi2 • Relative risk: pi1/pi2 • Odds Ratio: odds1/odds2 odds1=pi1/(1-pi1) odds2=pi2/(1-pi2)

  21. Example: Aspirin vs. Heart Attack • Prospective sampling; Row totals were fixed

  22. Chi-square Test for Trend Situation: A binary response (success, failure) + an ordinal explanatory variable Question: Is there a trend? Are the proportions (of success) in each of the levels of the explanatory variable increasing or decreasing in a linear fashion?

  23. Example: Shoulder Harness Usage Question: Is the proportion of shoulder harness usage increasing or decreasing linearly as the car size gets larger?

  24. SAS Output Statistics for Table of response by car_size Statistic DF Value Prob Chi-Square 2 0.6080 0.7379 Likelihood Ratio Chi-Square 2 0.6092 0.7374 Mantel-Haenszel Chi-Square 1 0.3073 0.5793 Phi Coefficient 0.0277 Contingency Coefficient 0.0277 Cramer's V 0.0277

  25. Meta Analysis • Also known as Mantel-Haenszel test; stratified analysis Situation: When another variable (strata) may “pollute” the effect of a categorical explanatory variable on a categorical response Goal: Study the effect of the explanatory while controlling the stratification variable

  26. Example: Respiratory Improvement

  27. SAS Output Statistics for Table 1 of trtmnt by response Controlling for center=1 Statistic DF Value Prob Chi-Square 1 10.0198 0.0015 Likelihood Ratio Chi-Square 1 10.2162 0.0014 Continuity Adj. Chi-Square 1 8.7284 0.0031 Mantel-Haenszel Chi-Square 1 9.9085 0.0016 Phi Coefficient 0.3337 Contingency Coefficient 0.3165 Cramer's V 0.3337 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits Case-Control (Odds Ratio) 4.0134 1.6680 9.6564 Cohort (Col1 Risk) 2.0714 1.2742 3.3675 Cohort (Col2 Risk) 0.5161 0.3325 0.8011 Sample Size = 90

  28. SAS Output Summary Statistics for trtmnt by response Controlling for center Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 18.4106 <.0001 2 Row Mean Scores Differ 1 18.4106 <.0001 3 General Association 1 18.4106 <.0001

  29. SAS Output Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 4.0288 2.1057 7.7084 (Odds Ratio) Logit 4.0286 2.1057 7.7072 Cohort Mantel-Haenszel 1.7368 1.3301 2.2680 (Col1 Risk) Logit 1.6760 1.2943 2.1703 Cohort Mantel-Haenszel 0.4615 0.3162 0.6737 (Col2 Risk) Logit 0.4738 0.3264 0.6877 Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 0.0002 DF 1 Pr > ChiSq 0.9900 Total Sample Size = 180

More Related