1 / 46

Statistics for clinicians

Learn the difference between correlation and linear regression, including how to compute correlation coefficients, predict values using linear regression, and understand assumptions and sources of variation in linear regression.

pdecker
Download Presentation

Statistics for clinicians

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHAProfessor and Executive Director, Research CenterUniversity of South Florida, College of NursingProfessor, College of Public HealthDepartment of Epidemiology and BiostatisticsAssociate Member, Byrd Alzheimer’s InstituteMorsani College of MedicineTampa, FL, USA

  2. SECTION 6.1 Correlation versus linear regression

  3. Learning Outcome: Distinguish the relationship between correlation and linear regression

  4. Correlation and Regression are both measures of association Some Terms for “association” variables: Variable 1: “x” variable independent variable predictor variable exposure variable Variable 2: “y” variable dependent variable outcome variable

  5. Correlation Coefficient Computation Form: Pearson correlation (“r”) Co-variation where x and y are the sample meansof X and Y, sx and sy are the sample standard deviations of X and Y.

  6. Introduction to Linear Regression Like correlation, the data are pairs of independent (e.g. “X”) and dependent (e.g. “Y” variables {(xi,yi): i=1,...,n}. However, here we seek to predict values of Y from X. The fitted equation is written: y = b0 + b1xwhere y is the predicted value of the response (e.g. blood pressure) obtained by using the equation. This equation of the line best represents the association between the independent variable and the dependent variable The residuals are the differences between the observed and the predicted values: {(yi – yi): i=1,…,n}

  7. Introduction to Linear Regression Best fitting line Minimize distance between predicted and actual values r = 0.76

  8. Introduction to Linear Regression y = b0 + b1x y = predicted value of response (outcome) variable b0 = constant: the intercept (the value of y when x = 0). b1 = constant: coefficient for slope of regression line – the expected change in y for a one-unit change in x Note: unlike the correlation coefficient, b is unbounded. xi = values of independent (predictor) variable for subject i

  9. SECTION 6.2 Least squares regression and predicted values

  10. Learning Outcomes: Describe the theoretical basis of least squares regression Calculate and interpret predicted values from a linear regression model

  11. Introduction to Linear Regression y = b0 + b1x In the above equation, the values of the slope (b1) and intercept (b0) represent the line that best predicts Y from X. More precisely, the goal of regression is to minimize the sum of the squares of the vertical distances of the points from the line. i.e. minimize ∑(y – y)2 This is frequently done by the method of “least squares” regression.

  12. Least squares estimates: sy b1 = r sx b0 = Y – b1X Example: We wish to estimate total cholesterol level (y) from BMI (x) Assume rxy= 0.78; Y = 205.9 sy = 30.8 X = 27.4 sx = 3.7 sy 30.8 b1 = r = 0.78 = 6.49 sx 3.7 b0 = Y – b1X = 205.9 – 6.49(27.4) = 28.07 The equation of the regression line is: y = 28.07 + 6.49(BMI)

  13. Least squares estimates: (Practice) sy b1 = r sx b0 = Y – b1X Example: We wish to estimate systolic blood pressure (y) from BMI (x) Assume rxy= 0.46; Y = 133.8 sy = 18.4 X = 26.6 sx = 3.5 sy b1 = r = sx b0 = Y – b1X = The equation of the regression line is: y =

  14. Least squares estimates: (Practice) sy b1 = r sx b0 = Y – b1X Example: We wish to estimate systolic blood pressure (y) from BMI (x) Assume rxy= 0.46; Y = 133.8 sy = 18.4 X = 26.6 sx = 3.5 sy 18.4 b1 = r = 0.46 = 2.42 sx 3.5 b0 = Y – b1X = 133.8 – 2.42(26.6) = 69.43 The equation of the regression line is: y = 69.43 + 2.42(BMI)

  15. Least squares estimates: (Practice) The equation of the regression line is: y = 69.43 + 2.42(BMI) Predict systolic blood pressure for the following 3 individuals: Person 1 has BMI of 26.4 Person 2 has BMI of 28.9 Person 3 has BMI of 34.8 y1 = y2 = y3 =

  16. Least squares estimates: (Practice) The equation of the regression line is: y = 69.43 + 2.42(BMI) Predict systolic blood pressure for the following 3 individuals: Person 1 has BMI of 26.4 Person 2 has BMI of 28.9 Person 3 has BMI of 34.8 y1 = 69.43 + 2.42(26.4) = 133.3 y2 = 69.43 + 2.42(28.9) = 139.4 y3 = 69.43 + 2.42(34.8) = 153.6

  17. SECTION 6.3 Assumptions and sources of variation in linear regression 17

  18. Learning Outcomes: Describe the assumptions required for valid use of the linear regression model Describe the partitioning of sum of squares in the linear regression model

  19. Introduction to Linear Regression • Some assumptions for linear regression: • Dependent variable Y has a linear relationship to the independent variable X • This includes checking whether the dependent • variable is approximately normally distributed. • Independence of the errors (no serial correlation)

  20. Y = 90.681 + 0.945(age) R = 0.597

  21. Fundamental Equations for Regression • Coefficient of determination (r2) • Proportion of variation in Y “explained by the regression on X • explained variation SSR SSE • R2 = ----------------------- = ----- = 1 - ------ • total variation SST SST

  22. Example: Fundamental Equations for Regression Y r = 0.42 X y = b0 + b1x y = 9.545 + 0.477(x)

  23. Example: Fundamental Equations for Regression y = 9.545 + 0.477(x) SST = 132, dfT = 11 SSR = 23, dfR = 1 SSE = 109, dfE = 10 SSR R2 = ----- = 0.18 SST

  24. Practice: Fundamental Equations for Regression y = 17.17247 - 0.53707(x) Complete the entries in the table below to determine SST, SSR, SSE, and R2 SST = _____, dfT = ____ SSR = ______, dfR = ____ SSE = ______, dfE = ____ SSR R2 = ----- = _______ SST

  25. Practice: Fundamental Equations for Regression y = 17.17247 - 0.53707(x) SST = 80.5, dfT = 9 SSR = 19.1, dfR = 1 SSE = 61.4, dfE = 8 SSR R2 = ----- = 0.24 SST

  26. SECTION 6.4 Multiple linear regression model 30 30

  27. Learning Outcome: Calculate and interpret predicted values from the multiple regression model 31

  28. Multiple Linear Regression • Extension of simple linear regression to assess the association between 2 or more independent variables and a single continuous dependent variable. • The multiple linear regression equation is: • Each regression coefficient represents the change in y relative to a one unit change in the respective independent variable holding the remaining independent variables constant. • The R2 from the multiple linear regression model represents percentage of variation in the dependent variable “explained” by the set of predictors.

  29. Multiple Linear Regression Example: Predictors of systolic blood pressure: y= 68.15 + 0.58(BMI) + 0.65(age) + 0.94(male) + 6.44 (tx-hypertension)

  30. Practice: Estimate systolic blood pressure for the following persons: Person 1: BMI=27.9; age=54; female; on treatment for hypertension Person 2: BMI=34.9; age=66; male; on treatment for hypertension Person 3: BMI=24.8; age=47; female; not on treatment for hypertension y1 = y2 = y3 =

  31. Practice: Estimate systolic blood pressure for the following persons: Person 1: BMI=27.9; age=54; female; on treatment for hypertension Person 2: BMI=34.9; age=66; male; on treatment for hypertension Person 3: BMI=24.8; age=47; female; not on treatment for hypertension y1 = 68.15 + 0.58(27.9) + 0.65(54) + 0.94(0) + 6.44(1) = 125.9 y2 = 68.15 + 0.58(34.9) + 0.65(66) + 0.94(1) + 6.44(1) = 138.7 y3 = 68.15 + 0.58(27.9) + 0.65(54) + 0.94(0) + 6.44(0) = 113.1

  32. Framingham Risk Calculation (10-Year Risk): Dependent Variable: 10-year risk of CVD Independent Variables: Age, gender, total cholesterol, HDL cholesterol, smoker, systolic BP On medication for BP http://hp2010.nhlbihin.net/atpiii/calculator.asp

  33. SECTION 6.5 SPSS for linear regression analysis 37 37 37

  34. Learning Outcome: Analyze and interpret linear regression models using SPSS 38 38

  35. SPSS Analyze Regression Linear Dependent Variable Independent Variable(s) Statistics --- Estimates --- Confidence intervals --- Model fit --- Partial correlations --- Descriptives Example: Dependent variable: HDL Cholesterol Independent variable: BMI

  36. y = 70.141 – 0.442(BMI)

  37. SPSS Analyze Regression Linear Dependent Variable Independent Variable(s) Statistics --- Estimates --- Confidence intervals --- Model fit --- Partial correlations --- Descriptives Example: Dependent variable: HDL Cholesterol Independent variable(s): BMI, gender (1=male, 2=female)

  38. y = 53.494 – 0.481(BMI) + 10.663(female)

  39. SPSS Analyze Regression Linear Dependent Variable Independent Variable(s) Statistics --- Estimates --- Confidence intervals --- Model fit --- Partial correlations --- Descriptives Example: Dependent variable: HDL Cholesterol Independent variable(s): BMI, gender, age

  40. y = 43.026 – 0.464(BMI) + 10.735(female) + 1.66(age)

  41. Practice: Estimate HDL cholesterol levels for the following persons: Person 1: BMI=25.7; female; age=60 Person 2: BMI=36.9; male; age=66 Person 3: BMI=31.8; female; age=51 y1 = y2 = y3 =

  42. Practice: Estimate HDL cholesterol levels for the following persons: Person 1: BMI=25.7; female; age=60 Person 2: BMI=36.9; male; age=66 Person 3: BMI=31.8; female; age=51 y1 = 43.026 – 0.464(25.7) + 10.735(1) + 0.166(60) = 51.8 y2 = 43.026 – 0.464(36.9) + 10.735(0) + 0.166(66) = 36.9 y3 = 43.026 – 0.464(31.8) + 10.735(1) + 0.166(51) = 47.5

More Related