1 / 43

Thinking About Data: A Simple Principle to Help You Improve your Scientific Data Analysis

Thinking About Data: A Simple Principle to Help You Improve your Scientific Data Analysis Scott A. Venners, Ph.D., MPH November 13, 2003. 1. PowerPoint slides available at: www.artima.com/AMU/lecture.ppt (Try tomorrow). 2. ?. Classes. First Data Set. 3. Y = Outcome Variable

etana
Download Presentation

Thinking About Data: A Simple Principle to Help You Improve your Scientific Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thinking About Data: A Simple Principle to Help You Improve your Scientific Data Analysis Scott A. Venners, Ph.D., MPH November 13, 2003 1

  2. PowerPoint slides available at: www.artima.com/AMU/lecture.ppt (Try tomorrow) 2

  3. ? Classes First Data Set 3

  4. Y = Outcome Variable X = Predictor of Interest Cov1…N = Potential Confounders (Covariates) Y = X + Cov1 + Cov2 + Cov3 + … + Cov(n) X p-value <0.05? Yes - Write a paper. 4

  5. Simple Principle: • Your model only represents one possible explanation of data. • You must actively think of all possible alternative explanations and test them. • Those that are not testable define the uncertainty of your analysis. 5

  6. Possible Explanations of Data (Can Test) (Cannot Test) 6

  7. Possible Explanations of Data (Can Test) (Cannot Test) 7

  8. Possible Explanations of Data (Cannot Test) (Can Test) 8

  9. Possible Explanations of Data (Cannot Test) Model 9

  10. Do not stop here! (Can Test) (Cannot Test) Model 10

  11. Skills you need: • Thinking of possible explanations • Knowing how to test them. 11

  12. Example 1: Simple model. Skill: Visualizing Confounding 12

  13. Example 1: Does an inactive lifestyle increase the risk of low bone density? = Inactive Lifestyle = Active Lifestyle 13

  14. Active Lifestyle Inactive Lifestyle 14

  15. Active Lifestyle Inactive Lifestyle = Active Lifestyle = Inactive Lifestyle = Low Bone Density 15

  16. Active Lifestyle Inactive Lifestyle What else could cause this result? Female, Smoking, Excessive Alcohol, Old Age… 16

  17. Active Lifestyle Inactive Lifestyle Active Lifestyle Inactive Lifestyle Female Smoking Ex Alcohol Old Age 49% 21% 1% 30% 51% 19% 1% 50% 17

  18. Active Lifestyle Inactive Lifestyle 30% Old Age 50% Is the association between inactive lifestyle and low bone density confounded by old age? 18

  19. Active Lifestyle Inactive Lifestyle 30% Old Age 50% Is the association between inactive lifestyle and low bone density confounded by old age? No 19

  20. Older Age Younger Age 50% 50% Low Bone Density Low Bone Density 30% 30% Active Inactive Active Inactive 20

  21. Active Lifestyle Inactive Lifestyle 30% Old Age 50% Is the association between inactive lifestyle and low bone density confounded by old age? Yes 21

  22. Older Age Younger Age 100% 100% Low Bone Density Low Bone Density 0% 0% Active Inactive Active Inactive 22

  23. 10% 30% 10% 30% Inactive Only 10 + 0(Old) + 20(Inactive) Low Bone Density Active (0) Inactive (1) Active (0) Inactive (1) Younger Age (0) Older Age (1) Independent Effect(s) 23

  24. 10% 10% 30% 30% Low Bone Density Older Age Only 10 + 20(Old) + 0(Inactive) 10% 30% 10% 30% Inactive Only 10 + 0(Old) + 20(Inactive) Low Bone Density Active (0) Inactive (1) Active (0) Inactive (1) Younger Age (0) Older Age (1) Independent Effect(s) 24

  25. 10% 10% 30% 30% Low Bone Density Older Age Only 10 + 20(Old) + 0(Inactive) 10% 30% 10% 30% Inactive Only 10 + 0(Old) + 20(Inactive) Low Bone Density Active (0) Inactive (1) Active (0) Inactive (1) Younger Age (0) Older Age (1) Independent Effect(s) 10% 30% 30% 50% Both Older Age and Inactive 10 + 20(Old) + 20(Inactive) Low Bone Density 25

  26. Active (0) Inactive (1) Active (0) Inactive (1) Younger Age (0) Older Age (1) Independent Effect(s) 10% 30% 30% 50% Both Older Age and Inactive 10 + 20(Old) + 20(Inactive) Low Bone Density 26

  27. Active Inactive Active Inactive Younger Age (0) Older Age (1) Independent Effect(s) 10% 30% 30% 50% Both Older Age and Inactive 10 + 20(Old) + 20(Inactive) Low Bone Density 10% 30% 30% 60% Older Age and Inactive Interaction 10 + 20(Old) + 20(Inactive) + 10(Old*Inactive) Low Bone Density 27

  28. Example 2: Sometimes just putting potential confounders into model is not correct. 28

  29. Example 2: Does passive smoking increase the risk of chronic cough? = Passive Smoking = No Passive Smoking 29

  30. No Passive Smoking Passive Smoking 30

  31. No Passive Smoking Passive Smoking 25% Cough 25% = No Passive Smoking = Passive Smoking = Chronic Cough 31

  32. No Passive Smoking Passive Smoking 25% Cough 25% What else could cause this result? Active Smoking… 32

  33. No Passive Smoking Passive Smoking 45% Active Smoking 17% Is the association between passive smoking and cough confounded by active smoking? 33

  34. No Active Smoking Active Smoking 47% 47% 7% 20% Cough Cough No Passive Passive No Passive Passive 34

  35. How to model? No Active Smoking Active Smoking 47% 47% 7% 20% Cough Cough No Passive Passive No Passive Passive 35

  36. How to model? No Active Smoking (0) Active Smoking (1) 47% 47% 7% 20% Cough Cough No Passive (0) Passive (1) No Passive (0) Passive (1) ? Cough% = 7 + 40(Smoke) + 13(Passive) - 13(Smoke*Passive) No 36

  37. Example 3: Sometimes explanations for data are not so clear. 37

  38. Odds ratios of early pregnancy loss. Husband’s current smoking None <20 cigs/day >20 cigs/day Crude Adjusted* OR p OR p Ref 1.19 .429 2.18 .013 Ref 1.04 .854 1.81 .049 * Adjusted for husband and wife’s ages, education, stress, exposure to dust and noise, husband’s alcohol use, previous smoking, and exposure to toxins, and wife’s body-mass index. 38

  39. If remove husband’s education from model: Husband’s current smoking None <20 cigs/day >=20 cigs/day Crude Adjusted* OR p OR p Ref 1.19 .429 2.18 .013 Ref 1.14 .576 2.02 .022 39

  40. Husband’s Smoking None <20 cigs/day >=20 cigs/day 79% 59% 50% High School 40

  41. Husband’s Smoking None <20 cigs/day >20 cigs/day 44% 29% 30% % Early Pregnancy Loss 20% 22% 21% < High School >= High School 41

  42. Husband’s Smoking None <20 cigs/day >20 cigs/day 79% 59% 50% High School 44% 29% 30% % Early Pregnancy Loss 20% 22% 21% < High School >= High School 42

  43. Main Points: No matter if you have good results or bad, always think beyond your preferred explanation for data. Explore all possibilities before choosing your preferred model. Acknowledge what you cannot test as your limitations. 43

More Related