310 likes | 482 Views
HSRP 734: Advanced Statistical Methods June 19, 2008. Extensions of Logistic Regression. Outcomes with more than 2 categories Categories have order Unordered Conditional logistic regression Analysis of matched data. Extensions of Logistic Regression. Exact methods for small samples
E N D
Extensions of Logistic Regression • Outcomes with more than 2 categories • Categories have order • Unordered • Conditional logistic regression • Analysis of matched data
Extensions of Logistic Regression • Exact methods for small samples • Fisher’s exact • Exact logistic regression • Correlated/Clustered data • GEE method • Mixed models
Extensions of Logistic Regression • Outcomes with more than 2 categories (polytomous or polychotomous) • Cumulative logit model – Proportional odds model for ordinal outcomes (ordered categories) • Generalized logit model for nominal outcomes or non-proportional odds models (unordered categories)
Extensions of Logistic Regression • Cumulative logit model • Fits a logistic regression model with g-1 intercepts for a g category outcome and one model coefficient for each predictor • Models cumulative probability of being in a “lower” category
Ordinal Logistic Regression • Odds ratios take on interpretation “% increase/decrease in the odds of being in a lower/higher category” • Subject to the “Proportional Odds” assumption
Extensions of Logistic Regression • Generalized logit model • Fits a logistic regression model with g-1 intercepts and g-1 model coefficients for a g category outcome • Model captures the multinomial probability of being in a particular category using generalized logits
Nominal Logistic Regression • Odds ratios have regular interpretation, just have to be careful with which comparisons are being made (reference category) • Does not assume “Proportional Odds”
Conditional logistic regression • Can use for matched data (e.g., case-control studies) • Provides unbiased estimates of odds ratios and CI’s
Extensions to Logistic Regression • Exact Logistic Regression • Small Sample Size • Adequate sample size but rare event (sparse data)
Fisher’s exact test • Exact test for RxC table where Chi-square test assumptions are doubtful • Why not always use Fisher’s exact test and Exact logistic regression?
Extensions of Logistic Regression • Longitudinal data / repeated measures data / Clustered data with binary outcomes • Multilevel models (nested data structures) • GEE (Generalized Estimating Equations) • GLMM (Generalized Linear Mixed Models)
Two methods for handling clustered outcomes • Mixed models • Likelihood based • Use random effects to model clustered observations • continuous outcome (but now extended for categorical) • Generalized Estimating Equation (GEE) • Non-likelihood based • Can handle large number of clusters • categorical outcome
GEE • GEE can be used in • Longitudinal studies • repeated measures of the same individual form a cluster • Community studies • subjects clustered by neighborhood • Familial studies • subjects clustered by family • Epidemiological studies • Different forms of clusters – e.g., pedigree
GEE • In general GEE has 3 sets of parameters to estimate: • Regression parameter (population-averaged effects) • Correlation parameter (cluster parameter) • Scale factor (not uncommon to assume =1)
GEE • In its simplest form, GEE can be considered an extension of logistic regression for clustered data • Clustered data are common • Time: Longitudinal analysis with repeated measurements on individual (e.g., BL, 1m, 2m, 6m follow-up) • Individual: Cross-sectional analysis with multiple outcomes (e.g., left eye, right eye) • Background: Subjects clustered because of common geographical or social background (e.g., clinic)
Correlation structure • Correlation structure • Often called the working correlation structure in GEE • Specifies how the observations within a cluster are related • Often assumes correlation structure uniform throughout clusters
Unstructured • All correlation coefficients free to take any value • E.g.,
Exchangeable • Any responses within the same cluster has the same correlation • Simple (1 parameter to estimate)
Autogressive AR(1) • Correlation between responses depends on the interval of time between responses • Farther apart responses => weaker correlation • Only 1 parameter to estimate!
Correlation matrix • Selection of a “working correlation structure” is at the discretion of the researcher! • How does the correlation structure affects the results?
Properties of GEE estimators • How about estimate of correlation if “working” correlation matrix is not correctly specified? • Model-based estimate => not consistent • Empirical (robust) estimate => still consistent
Properties of GEE estimators • Even if correlation structure misspecified, estimate for logistic regression is still consistent • if correlation misspecified, estimate not as efficient (SE is larger) • This property contributes to the popularity of GEE • GEE works well with larger #’s of clusters