1 / 36

Categorical Data Analysis & Logistic Regression

Categorical Data Analysis & Logistic Regression. Outline. Two-way contingency tables: RR, Odds ratio, Chi-square tests Three-way contingency tables: Conditional independence, Homogeneous association, Common odds ratio Logistic regression: Dichotomous response

judith-holt
Download Presentation

Categorical Data Analysis & Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Categorical Data Analysis & Logistic Regression

  2. Outline • Two-way contingency tables: RR, Odds ratio, Chi-square tests • Three-way contingency tables: Conditional independence, Homogeneous association, Common odds ratio • Logistic regression: Dichotomous response • Logistic regression: Polytomous response

  3. First example: Aspirin & heart attacks • Clinical trials table of aspirin use and MI • Test whether regular intake of aspirin reduces mortality from cardiovascular disease • Data set • Prospective sampling design: Cohort studies, Clinical trials

  4. Second example: Smoking & heart attacks • Case-control study: table of smoking status and MI • Compare ever-smokers with nonsmokers in terms of the proportion who suffered MI • Data set • Retrospective sampling design: Case-control study, Cross-sectional design • Remark: Observational studies vs. experimental study

  5. Comparing proportions in table • Difference: • Relative risk: • Useful when both proportions 0 or 1 • : RR is more informative • : Response is independent of group

  6. Example (revisited) • 1st example • =0.0171-0.0094=0.0077, 95% CI=(0.005, 0.011) • Taking aspirin diminishes heart attack • , 95% CI=(1.43, 2.3) • Risk of MI is at least 43% higher for the placebo group • 2nd example • , : Not estimable, meaningless even though possible • Estimate proportions in the reverse direction • Proportion of smoking given MI status: (suffering MI), (Not suffered MI)

  7. Association measure: odds ratio • Def’n: • Meaning • When two variables are independent, i.e., • When odds of success (in row 1) > (in row 2) • When odds of success (in row 1) < (in row 2) • Remark: When both variables are response, (called cross-product ratio) using joint probabilities

  8. Properties of odds ratio • Values of father from 1 in a given direction represent stronger association • When one value is the inverse of the other, two values of are the same strength of association, but in the opposite directions • Not changed when the table orientation reverses • Unnecessary to identify one classification as a response variable

  9. Example (revisited) • 1st example • , 95% CI=(1.44, 2.33) • Estimated odds is 83% higher for the placebo group • 2nd example • Rough estimate of RR=3.8 • Women who had ever smoked were about four times as likely to suffer as women who had never smoked

  10. Independence tests • Hypothesis: • Two chi-square tests • Under , estimated expected frequency • Pearson’s = • Likelihood ratio(LR) statistic • For a large sample, follow a chi-squared null distribution with • Remark: When the chi-squared approximation is good. If not, apply Fisher’s exact test

  11. Example: AZT use & AIDS • Development of AIDS symptoms in AZT use and race • Study on the effects of AZT in slowing the development of AIDS symptoms • Data set

  12. Three interests in table • Conditional independence? When controlling for race, AZT treatment and development of AIDS symptom are independent • Use Cochran-Mantel-Haenszel(CMH) test • Summarize the information from partial tables • Homogeneous association? Odds ratios of AZT treatment and development of AIDS symptom are common for each race • Use Breslow-Day test • Common odds ratio? Use Mantel-Haenszel estimate

  13. Example (AZT use & AIDS revisited) • CMH=6.8( =1) with -value=0.0091 • Not independent! • Breslow-Day=1.39( =1) with -value=0.2384 • Homogeneous association! • Common odds ratio=0.49 • For each race, estimated odds of developing symptoms are half as high for those who took AZT

  14. Overview of types of generalized linear models(GLMs) • Three components: Random component (response variable), Linear predictor (linear combination of covariates), Link function • Types of GLMs

  15. Logistic regression with a quantitative covariate • Model: • Another representations • Odds= • Odds at level equals the odds at multiplied by • Curve ascends ( ) or descends ( ) • The rate of change increases as increases

  16. Example: Horseshoe crabs • Binary response • if a female crab has at least one satellite; otherwise • Covariate: female crab’s width • Data set

  17. Example: Horseshoe crabs

  18. Goodness-of-fit tests • Working model: number of settings: number of parameters in : • Hypothesis: fits the data • Pearson’s statistic: • Deviance statistic: • approximately follow a chi-square null distribution with

  19. Inference for parameters • Interval estimation: • Two significance tests: • Wald test: Use • Likelihood ratio test: Use , log-likelihood function • Two tests have a large-sample chi-squared null distribution with

  20. Example (Horseshoe crabs revisited) • Fitted model: • : larger at lager width ( ) • There is a 64% increase in estimated odds of a satellite for each centimeter increase in width ( ) • with -value=0.506; with -value=0.4012 • 95% CI for =(0.298, 0.697) • Significance test: Wald=23.9 ( =1) with -value < 0.0001; LRT=31.3 ( =1) with -value < 0.0001

  21. Logistic regression with qualitativepredictors: AIDS symptoms data • Use indicator variables for representing categories of predictors • Logits implied by indicator variables

  22. Logistic regression with qualitativepredictors: AIDS symptoms data • =difference between two logits (i.e., log of odds ratio) at a fixed category of • Homogeneous association model

  23. Equivalence of contingency table & logistic regression • Conditional independence: CMH test vs. • Homogeneous association: Breslow-Day test vs. Goodness-of-fit test • Common odds ratio estimate: Mantel-Haenszel estimate vs.

  24. Computer Output for Model with AIDS Symptoms Data

  25. Logistic regression with mixed predictors: Horseshoe crabs data • For color=medium light, For color=medium, For color=medium dark, • For controlling

  26. Computer Output for Model for Horseshoe Crabs Data

  27. Estimated probabilities for primary food choice

  28. Logistic regression: ploytomous • Model categorical responses with more than two categories • Two ways • Use generalized logits function for nominal response • Use cumulative logits function for ordinal response • Notation • number of categories • response probabilities with

  29. Generalized logit model: nominal response • Baseline-category logit: Pair each category with a baseline category • when is the baseline • Model with a predictor • The effects vary according to the category paired with the baseline • These pairs of categories determine equations for all other pairs of categories • Eg, for a pair of categories • Remark: Parameter estimates are same no matter which category is the baseline

  30. Example: Alligator food choice • 59 alligators sample in Lake Gorge, Florida • Response: Primary food type found in alligator’s stomach • Fish(1), Invertebrate(2), Other(3, baseline category) • Predictor: alligator length, which varies 1.24~3.89(m) • ML prediction equations • Larger alligator seem to select fish than invertebrates • Independence test: Food choice & length • LRT=16.8006( ) with -value=0.0002

  31. Cumulative logit model: ordinal response • Logit of a cumulative probability • Categories 1 to : combined, Categories to : combined • Cumulative proportional odds model with a predictor • The effect of are identical for all cumulative logits • Any one curve for is identical to any of others shifted to the right or shifted to the left • For =log of odds ratio is • Proportional to the difference between values • Same for each cumulative probability

  32. Example: Political ideology & party affiliation • Response: Political ideology with five-point ordinal scale • Predictors: Political party(Democratic, Republican)

  33. Example: Political ideology & party affiliation • Parameter inference • , • Democrats tend to be more liberal than Republicans • Wald=57.1( ) with -value < 0.0001 • Strong evidence of an association • 95% CI for =(0.72, 1.23) or =(2.1, 3.4) • At least twice as high for Democrats as for Republicans • Goodness-of-fit • with -value=0.2957 Good adequacy!

  34. Another logit forms for ordinal response categories • Adjacent-categories logit • Adjacent-categories logits determine the logits for all pairs of response categories • Continuation-ratio logit • Form1: • Contrast each category with a grouping of categories from lower levels of response scale • Form2: • Contrast each category with a grouping of categories from higher levels of response scale

  35. Summary • Two-way contingency tables: RR, Odds ratio, Chi-square tests • Three-way contingency tables: Conditional independence, Homogeneous association, Common odds ratio • Logistic regression: Dichotomous response • Logistic regression: Polytomous response

  36. References • Agresti, A. (1996). An Introduction to Categorical Data Analysis, Wiley: New York (Also the 2nd edition is available) • Stokes, M.E., Davis, C.S., and Koch, G.G. (2000). Categorical Data Analysis Using The SAS System, Second Ed., SAS Inc.: Cary

More Related