1 / 40

Data Are Ready – Now What? Statistical Approaches and Common Mistakes

Data Are Ready – Now What? Statistical Approaches and Common Mistakes. Lisa M Sullivan Professor and Chair, Department of Biostatistics BUSPH. Plan. Run Analysis Find Significance p < 0.05 Publish. Questions Before You Begin…. What is your primary research question?

fineen
Download Presentation

Data Are Ready – Now What? Statistical Approaches and Common Mistakes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Are Ready – Now What?Statistical Approaches and Common Mistakes Lisa M Sullivan Professor and Chair, Department of Biostatistics BUSPH

  2. Plan • Run Analysis • Find Significance p < 0.05 • Publish

  3. Questions Before You Begin… • What is your primary research question? • Are you interested primarily in a relationship between an outcome and a risk factor or exposure? • Are you interested in prediction of an outcome, using one or more risk factors or exposures?

  4. Questions Before You Begin… • What is the study design? (Cohort, Case/Control, RCT) • What kinds of inferences can be made – what are the limitations? • What are the outcome measures? Is there a primary outcome measure? • What are the risk factors and exposures?

  5. Questions Before You Begin… • What is the nature of the outcome measure – Continuous, Categorical, Dichotomous or Time to Event? • What is the primary effect measure? • Are data correlated?

  6. Example • Is BMI a significant risk factor for Hard CHD (MI, Coronary Heart Disease Death)? • Framingham Heart Study

  7. Framingham Study Risk Prediction Models • Predict likelihood that an individual will have an event (e.g., Hard CHD) over a specified period of time (e.g., the next 10 years) • Models designed to include risk factors that are readily available • Age, blood pressure, lipids, smoking, diabetes, treatment for hypertension & high cholesterol Wilson PWF, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation 1998; 97:1837-1847

  8. Analysis • Do we know enough to proceed? • What is the study design? • Prospective Follow Up • Correlated Data? • What is the outcome measure? Is there a primary outcome measure? • HCHD Incidence • What is the primary risk factor? • BMI

  9. Analysis • Analytic method? • Survival analysis • Nature of the association? • Linear? Non-linear? • Measure of effect? • Hazard ratio

  10. Analysis • Conduct Cox proportional hazards regression analysis relating BMI to Hard CHD HRBMI = 1.06 (95% CI: 1.04-1.08) p<0.0001 • Are you happy with this?

  11. First Pass? Overweight BMI=25.0-29.9 Obese BMI > 30 HROverwght = 1.90 (1.56-2.30), p<0.0001 HRObese = 2.23 (1.73-2.87), p<0.0001 • Done?

  12. Suggestion: Explore Your Data • Understand analytic sample - population at risk and outcome • Generate descriptive statistics on risk factor • Confounding/Adjustment?

  13. Suggestion: Explore Your Data • Understand analytic sample - population at risk and outcome • N=7589 participants in FHS Cohort and Offspring • Free of CVD, Age 35-74 • 12 Year Follow-up • 501 (6.6%) HCHD events

  14. Distribution of BMI Mean (SD) Min-Max Total Sample 24.5 (5.0) 12-55 Men 24.4 (5.1) 12-49 Women 24.5 (4.9) 13-55 35-44 22.5 (5.0) 13-55 45-54 24.5 (5.0) 12-51 55-64 26.2 (4.4) 15-53 65-74 25.9 (3.8) 16-45

  15. Distribution of BMI % Overweight % Obese Total Sample 30.6 11.9 Men 37.0 11.7 Women 25.0 12.0 35-44 19.0 7.2 45-54 30.0 11.9 55-64 39.2 17.3 65-74 44.8 12.4

  16. Other Risk Factors • Age • Sex • Blood Pressure • Cholesterol • Diabetes • Smoking

  17. Confounding • A distortion of the effect of the risk factor on outcome due to other factors • Confounder may account for part or all of observed effect, may mask effect • How do we examine confounding? • Evaluate association of confounder with outcome • Evaluate association of confounder with primary risk factor of interest

  18. Ways to Handle Confounding • Design • Randomization • Matching • Analysis • Stratification • Multivariable analysis/Statistical adjustment

  19. Multivariable Models: BMI HR (95% CI) p Unadjusted 1.06 (1.04-1.08) 0.0001 Age & Sex Adj 1.04 (1.02-1.06) 0.0001 Multivariable Adj* 1.02 (0.99-1.04) 0.1525 *Adjusted for age, sex, SBP, Trt for Htn, Total and HDL Cholesterol, Smoking, DM

  20. Multivariable Models: Overweight/Obesity Overweight Obese Unadjusted 1.90 (1.56-2.30) 2.23 (1.73-2.87) Age & Sex Adj 1.24 (1.02-1.52) 1.72 (1.33-2.33) Multivariable Adj* 1.11 (0.91-1.36) 1.28 (0.98-1.68) *Adjusted for age, sex, SBP, Trt for Htn, Total and HDL Cholesterol, Smoking, DM

  21. Confounding • Compare crude (unadjusted) measure of association with adjusted measure of association • If equal, then no confounding • Is confounding an issue here? Due primarily to which risk factor(s)?

  22. Multivariable Model for HCHD Risk Factor HR (95% CI) p BMI 1.015 (0.995-1.035) 0.1525 Age 1.042 (1.031-1.052) 0.0001 Male Sex 2.705 (2.186-3.346) 0.0001 SBP 1.012 (1.007-1.017) 0.0001 Trt for Htn 1.288 (1.009-1.643) 0.0420 Total Chol 1.007 (1.005-1.009) 0.0001 HDL 0.972 (0.964-0.979) 0.0001 DM 1.736 (1.308-2.304) 0.0001 Curr Smoker 2.154 (1.798-2.581) 0.0001

  23. What Tests to Use When • Outcome Variable • Continuous (means) • Discrete (proportions) • Time to Event (survival) • Number of Groups • One • Two • > Two • Independent or Dependent/Matched Groups

  24. What Tests to Use When • Continuous Outcome • 1 Group – CI, t Test for Mean • Historical control • 2 Independent Groups – CI, t Test for Difference in Means • 2 Dependent Groups – CI, t Test for Mean Difference (Post-Pre) • Focus on difference scores

  25. What Tests to Use When • Continuous Outcome • > 2 Independent Groups – ANOVA • Test for difference in means • Specific contrasts (2 at a time) but control for Type I error rate with multiple testing • > 2 Dependent Groups – Repeated Measures ANOVA • Repeated assessments over time • Multiple risk factors or exposures • Multivariable linear regression analysis

  26. What Tests to Use When • Dichotomous Outcome • 1 Group – CI, Z Test for Proportion • 2 Independent Groups – CI, Z Test for Difference in proportions • 2 Dependent Groups – McNemar’s test for differences in proportions • > 2 Independent Groups – Chi-Square Test • Multiple risk factors or exposures • Multivariable logistic regression analysis

  27. What Tests to Use When • Time to Event • 1 Group - Kaplan Meier Estimate of Survival • 2+ Independent Groups – Log Rank Test for Differences in Survival • Multiple risk factors or exposures • Cox proportional hazards regression analysis

  28. Common Mistakes • Inefficient Design • A badly designed study can never be retrieved, a poorly analyzed study can usually be re-analyzed! • Analytic Planning Issues • Interpretation Issues

  29. Which Design is Best • Depends on the study question • What is current knowledge on topic • How common is disease (and risk factors) • How long would study take, what are costs • Ethical issues

  30. Common Mistakes (cont’d) • Misclassification of Outcome • Continuous (means) • Discrete (proportions) • Ordered categories, unordered categories, dichotomous (success/failure) • Time to event (survival time)

  31. Common Mistakes (cont’d) • Unit of analysis • Observations are repeated on the same unit but treated as independent • Observations are clustered, need to take into account structure in data

  32. Common Mistakes (cont’d) • Missing data • Suppose required number are enrolled but 20% drop out over the course of follow-up; What if 40% of the treatment group drop out and 0% of control drop out? • Patterns of missing data • Do everything possible to avoid missing data!

  33. Common Mistakes (cont’d) • Multiple testing • Each test has an associated Type I error (error rate per comparison, e.g. 5%) • Familywise error rate (likelihood of a false positive result over all comparisons) • Multiple comparisons procedures control familywise Type I error rate (e.g., Tukey, Dunnett) • Bonferroni correction

  34. Common Mistakes (cont’d) • Correlation Vs Cause and Effect Design Observational studies – correlation Experimental studies – cause and effect Timing Does A cause B or vice versa

  35. Common Mistakes (cont’d) • Lack of significance • Failure to show statistical significance is not equivalence (non-inferiority) • Must provide evidence of power when study fails to show statistical significance (equality or study is too small?) • Determine sample size required BEFORE study launch

  36. Common Mistakes (cont’d) • Generalizabilty • Target population • Draw sample, analyze sample, make inferences back to target population

  37. Magnitude of Effect • Statistical significance (p<0.05) is only one way to interpret results • Always look at magnitude of effect • Consistency of effect in other studies • Biologically plausible effect • Dose-response relationship

  38. Summary • Now that your dataset is ready, what to do? • Do not jump to the final analysis! • Identify the types of variables you are evaluating • Identify the study design • Plan the appropriate analyses

  39. Summary • Generate descriptive statistics for all variables, especially outcome and primary risk factor • Obtain crude measures of association • Perform stratified and adjusted analyses • Does final result make sense, given all of the above and what you know from other studies? • What are the limitations of analysis/ inferences?

  40. BS775Advanced Concepts and Applications of Statistics in Clinical Research: Spring 2010 Tues 2:30-5 Topics to be covered include: Logistic regression Linear regression Meta-analysis Survival analysis Categorical Data analysis Clinical Trials Bayesian approaches And much more! Please contact Kimber Wukitschfor more information (kimberw@bu.edu)

More Related