1 / 12

Analysis of Complex Survey Data

Analysis of Complex Survey Data. Day 3: Regression. Today’s schedule. Part I: Basic review of common regressions and when to use them PART II: Introduction to PROC REGRESS PROC RLOGIST PROC LOGLINK PROC MULTILOG. Regression.

doli
Download Presentation

Analysis of Complex Survey Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Complex Survey Data Day 3: Regression

  2. Today’s schedule • Part I: Basic review of common regressions and when to use them • PART II: Introduction to • PROC REGRESS • PROC RLOGIST • PROC LOGLINK • PROC MULTILOG

  3. Regression • Typically in epidemiologic research, our outcomes fall into four major types: • Continuous • Normally distributed • Skewed • Counts • Binary • Ordinal • Nominal

  4. Continuous outcome, normally distributed • Linear regression

  5. Continuous outcome, right skewed • Poisson regression

  6. Counts • Poisson regression

  7. Binary outcome • Logistic regression

  8. Ordinal • Polytomous regression, cumulative logit link function • Likert scales • Ordered categorical scales (age, income) • The cumulative logit link function assumes that the effect of going from 1 to 2 is the same as the effect of going from 2 to 3

  9. Nominal • Polytomous regression, general logit link function • Race • Diagnosis (depression versus anxiety versus substance use disorder) • The general logit link function gives a different estimate for the effect of going from 1 to 2 and the effect of going from 2 to 3

  10. Categorizing your exposure • Check assumptions regarding the functional form of the relationship between the exposure and the outcome • E.g., relationship between age and alcohol use disorders. We would not want to enter age as a continuous variable because we do not think age is linearly related to risk of alcohol use disorders • If you decide to categorize a continuous variable, decision on cutpoints can best be made if there is literature precedent • Relying on data driven cutpoints will make your work incomparable with other work in the literature • If there is no precedent: • Use quartiles or • Break up the exposure into small categories, and examine the relationship with the outcome in a regression model with no predictors (on the log scale if using logistic regression).

  11. Choosing covariates • Most important: DO NOT SKIP THE GOUNDWORK! • Check associations with exposure and outcome • Check associations among covariates • Categorize the covariates appropriately • When should something be evaluated as a moderator, and when should it be a confounder/covariate? • Most of the time, it is clear: do you think that the relationship between exposure and outcome will be the same across levels of the third variable, or do you think it will be different? • If you do not have an a priori hypothesis and are just trying to build a solid statistical model, try as a moderator first. If significant, leave in as a moderator. • Because interaction terms are sometimes difficult to interpret on their own, think about just creating subset statistical models.

  12. LAB 3: Regression in SUDAAN

More Related