1 / 27

Biostatistics Case Studies 2014

Biostatistics Case Studies 2014. Session 4 : Regression Models and Multivariate Analyses. Youngju Pak, PhD. Biostatistician ypak@labiomed.org. What and Why?. Multivariate analysis (MVA) techniques allow more than two variables to be analysed at once.

shad-zamora
Download Presentation

Biostatistics Case Studies 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics Case Studies 2014 Session 4: Regression Models and Multivariate Analyses Youngju Pak, PhD. Biostatistician ypak@labiomed.org

  2. What and Why? • Multivariate analysis (MVA) techniques allow more than two variables to be analysed at once. • Compared with univariate or bivariate • Data richness with computational technologies advanced  Data reductions or classifications • eg., Factor analysis, Principal Component Analysis(PCA) • Several variables are potentially correlated with some degree  potential confounding  bias the result • eg., Analysis of Covariance (ANCOVA), Multiple Linear or Generalized Linear Regression Models

  3. What and Why ? • Many variables are all interrelated with multiple dependent and independent variables • eg., Multivariate Analysis of Variance (MANOVA), Path Models, Structural Equation Models(SEM), Partially Least Square(PLS) Models. • This Session will focus on multiple regression models.

  4. Why regression models? • To reduce “Random Noise” in Data => better variance estimations by adding source of variability of your dependent variables • eg. ANCOVA • To determine a optimal set of predictors => predictive models • eg. Variable selection procedures for multiple regression models • To adjust for potential confounding effects • eg, regression models with covariates

  5. Actual mathematical Models • ANOVA Yij=μ+τi+ϵij, ,whereYij represents the jth observation (j=1,2,…,n) on the ith treatment (i=1,2,…,l levels).  The errors ϵij are assumed to be normally and independently (NID) distributed, with mean zero and variance σ2. • ANCOVA with k number of covariates Yij=μ+τi+X1ij + X2ij + …+ Xkij + ϵij, • MANOVA (with p number of outcome variables) Y(nxp) = X(nx[q+1]) B([q+1] x p) + E (n x p)

  6. Actual mathematical Models • Simple Linear Regression Models (SLR) Yi= β0 + β1Xi+ εi µY(true mean value of Y) • ε =“error” (random noise due to random sampling error), assumed ε follow a normal distribution with mean=0, variance=σ2 • β0& β1= intercept & slope  often called Regression (or beta) Coefficients • Y=Dependent Variable(DV) • X=Independent Variable (IV) eg., Y= Insulin Sensitivity X= FattyAcid in percentage • Multiple Linear Regression Models (MLR) • Simple Logistic Models(SL) • Multiple Logistic Models(ML)

  7. SLR: Example SPSS output • Two-sided p-value=0.002. Thus, there is significant statistical evidence (alpha=0.05) to conclude that the true slope is notzero  Fatty Acid(%) is significantly related to insulin sensitivity . • Mean Insulin sensitivity increase by 37.208 unit as Fatty Acid(%) increase by one percent.

  8. SLR w/CI

  9. Checking the assumptions using a residual Plot A plot has to be looked as “RANDOM” no special pattern is supposed to be shown if the assumptions are met.

  10. Actual mathematical Models • Multiple Linear Regression Models (SLR) Y = β0+ β1X1 + β2 X2 + … + βk Xk + ε µY(true mean value of Y) • Assumptions are the same as SLR with one more addition : All Xs are not highly correlated. If they are, this is called “Multicollinearity”, which will make model very unstable. • Diagnosis for multicollinearity • Variance Inflation Factor (VIF) = 1  OK • VIF < 5  Tolerable • VIF > 5  Problematic  Remove the variable which has a high VIF or do PCA • Multiple Linear Regression Models (MLR) • Simple Logistic Models(SL) • Multiple Logistic Models(ML)

  11. MRL: Example mY= -56.935 + 1.634X1 + 0.249X2 • 1.634*Flexibility • For every 1 degree increase in flexibility, MEAN punt distance increases by 1.634 feet, adjusting for leg strength. • 0.249*Strength • For every 1 lb increase in strength, MEAN punt distance increases by 0.249 feet, adjusting for flexibility.

  12. What do mean by “adjusted for”? • If categorical covariates? • eg., • Mean % gain w/o adjustment for Gender • Exercise & Diet: (20%x10+10%x40) / 50 = 12 % • Exercise only: (15%x40 + 5%x10) / 50 = 13 % • Mean % gain with adjustment for Gender • Exercise & Diet: Male avg. x 0.5 + Female avg. x 0.5 = 20% x 0.5 + 10% x 0.5=15 % • Exercise only: Male avg. x 0.5 + Female avg. x 0.5 = 15% x 0.5 + 5% x 0.5=10%

  13. Why different? • % gain for males are 10% higher than female in both diet  potential confounding • However, two groups are unbalanced in terms of gender, i.e, 80% male for the exercise group while 20% female for the diet & exercise group  dilute the “treatment effect” • If continuous covariates such as baseline age, similar adjustment will be performed based on the correlation between % gain and the baseline age.

  14. Graphical illustration : Adjusting for a continuous covariate * Changes in Adiponectin (a glucose regulating protein) b/w two groups

  15. Multiple Logistic Regression Models • The model: Logit(π)=β0+ β1X1 + β2X2 + ••• +βkXk where π=Prob (event =1), Logit(π)=ln[π /(1- π)] • or π = e LP / (1+ e LP), where Lp= β0+ β1X1 + β2X2 + ••• +βkXk

  16. Interpretation of the coefficients in logistic regression models • For a continuous predictor, a coefficient (e β)represents the multiplicative increase in the mean odds of Y=1 for one unit change in X odds ratio for X+1 to X. • Similarly, for a nominal predictor, the coefficient represent the odds ratio for one group (X=1) to another (X=0). • Remember, MLR has other covariates. Hence, the interpretation of one coefficient is applied when other covariates are adjusted for.

  17. Estimated Prob. Vs. Age

  18. Other Models • Ordinal Logistic Regression for ordinal responses such as cancer stage I, II, III, IV : assumes the constant rate of change in OR between any two groups. • Poisson regressions when responses are count data such as # of pregnancy : over dispersion is common and some times a negative binomial distribution is used instead. • Mixed Model ; commonly used for a repeated measures ANOVA or ANCOVA. Time is used as within-subject factor and random factor. Mixed models are also used for nested design. • Cox proportional Hazard models: multivariate models for survival data.

  19. General Linear Modelvs. Generalized Linear Model(GLM) • A Linear Model  General Linear Model • eg., ANOVA, ANCOVA, MANOVA, MANCOVA, Linear regression, mixed model • A Non Linear Model  Generalized Linear Model • Eg., Logistic, Ordinary Logistic, Possion  All these used a link function for a response variable (Y) such as a logit link or possion link. • GEE(Generalized Estimating Equation) models are an extension of GLM.

  20. Variable Selection Procedures • Forward • By adding a new predictor that as the lowest p-value and keep repeating this step until no more predictors to be added at 0.05 alpha level • Backward • Start a full model with all predictors and eliminate the predictor with the highest p-value and keep repeating this procedure until no more predictors left to be eliminated at 0.05 alpha level • Stepwise • Combination of Forward and Backward • Level of stay : 0.01, Level of entry: 0.05 usually used • Simulation studies show Backward is most recommendable based on many simulation studies.

  21. Bariatric Surgery • Roux-en-Y gastric bypass, • Sleeve gastrectomy, • Gastric banding, • Biliopancreatic diversion.

  22. Table 1 Figure 1 Appendix ?

  23. Factors Associated with Achieving The Primary End Points at 3 Years

More Related