310 likes | 679 Views
Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health
E N D
Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health Dept. of Epidemiology, GSPH
Previous Biostat DidacticsFall 2009 – Spring 2010 • Descriptive Statistics: Examining Your Data • Data types: Qualitative (Categorical), Ordinal, Quantitative • Mean, SD, medians, quartiles, IQR, skewness, histograms, boxplots • Group Comparisons: Part 1 • Normal dist (mean, SD: 68%, 95%, 99% interpretation) • t-dist, degrees of freedom (n-1) • Confidence interval for the mean • Group Comparisons: Part 2 • Comparing means: Two-sample independent t-test • pooled and unequal variance (Satterthwaite) versions • interpretation of p-values, type I (false positive) and type II error
Previous Biostat DidacticsFall 2009 – Spring 2010 • Group Comparisons Part 3: Nonparametric Tests, Chi-squares and Fisher Exact • Comparing groups having small sample sizes (< 20) or with non-normal distributions => Use Wilcoxon Rank-Sum Test (nonparametric) (based on rank-order when sorted rather than on actual numeric values) • Comparing groups in the % falling into diff categories => Use Chi-square, Fisher’s Exact (if any cell n < 5)
Previous Biostat DidacticsFall 2009 – Spring 2010 • Correlation, Regression and Covariate-Adjusted Group Comparisons • Pearson vs Spearman correlation => linear vs monotone association • Regression: interpretation of beta coefficients • Standard errors, p-values • Continuous predictor => beta coeff is a slope • Dichotomous (e.g. group “dummy” 0,1 valued variable) => beta coeff is difference in response vs “referent” treatment_group = 1 knockout mouse = 0 wild mouse (referent) • Adjusting for important covars when comparing groups
Flow chart for group comparisons Measurements to be compared continuous discrete ( binary, nominal, ordinal with few values) Distribution approx normal or N ≥ 20? Chi-square Fisher’s Exact No Yes T-tests Non-parametrics
Flow chart for regression models(includes adjusted group comparisons) Outcome variable continuous or dichotomous? continuous dichotomous Predictor variable categorical? Time-to-event available (or relevant)? No Yes (e.g. groups) No Yes Multiple linear regression ANCOVA (Multiple linear regression - using dummy variable(s) for categorical var(s) Multiple logistic regression Cox proportional hazards regression
Analysis From Last Didactic … • In Health, Aging and Body Composition Knee-OA Substudy: Examine Association between SxRxKOA (knee OA) and CRP adjusted for BMI. Motivation: • Sowers M, Hochberg M et. al. C-reactive protein as a biomarker of emergent osteoarthritis. Osteoarthritis and Cartilage Volume 10, Issue 8, August 2002, Pages 595-601 Conclusion: “CRP is highly associated with Knee OA; however, its high correlation with obesity limits its utility as an exclusive marker for knee OA”
All White Females in HABC (N=844) [includes SxRxKOA (n=93); also rest of parent study cohort] N=5 had CRP > 30 (max=63.2) N=5
White Females Difference in average logCRP: 0.76 – 0.43 = 0.33
Two-Group Unadjusted Comparison Of Means Using Regression with Dummy-coded Groups proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1;run; * No OA is “referent” group (i.e. kneeOA=0) HABCID logCRP kneeOA BMI 1000 1.10972 0 22.5922 1001 0.16526 0 22.2751 1002 1.50988 0 26.1207 1003 -0.62048 0 26.9536 1014 0.65657 1 26.5266 1017 0.82039 1 30.2526 1033 0.84323 1 29.8458 1048 1.67787 1 39.8597
White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; “No OA” mean “kneeOA” mean difference from referent Same p-value as equal variance t-test Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed
proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1;run; Model: logCRP=0.42682 + 0.33091*kneeOA (intercept) KneeOA=0 logCRP=0.42682+0.33091*0 = 0.42682 KneeOA=1 logCRP=0.42682+0.33091*1 = 0.75773
ANCOVA (Analysis of Covariance)Compare logCRP adjusted for BMI
ANCOVA (Analysis of Covariance)Compare logCRP adjusted for BMI proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi; where female=1 and white=1; run; Unadjusted diff Was 0.33 BMI partially “explains” this difference Note: Equal BMI slopes in each group is being modeled
Notice: At any BMI level, the mean logCRP difference between KneeOA vs Not is smaller than the unadjusted difference Unadjusted Mean Difference {
logCRP between KneeOA vs NotAdjusted for BMI, Ageand Anti-inflammatory Meds Note: age is not significant (caveat: narrow HABC study age range: 69-80)
White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; “No OA” mean “kneeOA” mean difference from referent Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed
Pearson Correlation Pearson Correlation = a measure of linear association
Pearson vs Spearman Correlation • Spearman: • A measure of rank order correlation • Works for any general trend that is increasing or • decreasing and not necessarily linear
Pearson vs Spearman Correlation • Spearman: • A measure of rank order correlation • Works for any general trend that is increasing or • decreasing and not necessarily linear • Equals Pearson Correlation using the ranks of the • observations instead of actual values • Heuristically: Spearman measures degree that • low goes with low, middle with middle, high with high
Effect of Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=bmi_minus25; where female=1 and white=1 and kneeOA=1; run; logCRP= 0.58144 + 0.04699*(BMI-25) = 0.58144 at BMI=25 (see graphic)
Effect of Centering BMI at 25 Model 2: logCRP= 0.58144 + 0.04699*(BMI-25) = 0.58144-25*0.04699 + 0.04699*BMI =-0.59337 + 0.04699*BMI
Unadjusted Mean Difference {
ANCOVA (Analysis of Covariance)Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi_minus25; where female=1 and white=1; run; Note: Equal BMI slopes in each group is being modeled
Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; run;(“interaction term”) HABCID logCRP kneeOA BMI BMI_x_KneeOA 1000 1.10972 0 22.5922 0.0000 1001 0.16526 0 22.2751 0.0000 1002 1.50988 0 26.1207 0.0000 1003 -0.62048 0 26.9536 0.0000 1014 0.65657 1 26.5266 26.5266 1017 0.82039 1 30.2526 30.2526 1033 0.84323 1 29.8458 29.8458 1048 1.67787 1 39.8597 39.8597
Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; run; The “BMI” slopes are not signif different (p=0.8019) => they are parallel
Thank you • Questions, comments, suggestions or insights? • Remaining time: Open consultation …