1 / 24

This Week

Learn how to analyze the association between two variables and estimate the impact and prediction using linear regression.

bsims
Download Presentation

This Week

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This Week • Continue with linear regression • Begin multiple regression • Le 8.2 • C & S 9:A-E • Handout: Class examples and assignment 3

  2. Linear Regression • Investigate the relationship between two variables • Dependent variable • The variable that is being predicted or explained • Independent variable • The variable that is doing the predicting or explaining • Think of data in pairs (xi, yi)

  3. Linear Regression - Purpose • Is there an association between the two variables • Is BP change related to weight change? • Estimation of impact • How much BP change occurs per pound of weight change • Prediction • If a person loses 10 pounds how much of a drop in blood pressure can be expected

  4. Assumption for Linear Regression • For each value of X there is a population of Y’s that are normally distributed • The population means form a straight line • Each population has the same variance s2 • Note: The X’s do not need to be normally distributed, in fact the researcher can select these prior to data collection

  5. Simple Linear Regression Equation • The simple linear regression equation is: m y = 0 + 1x • b0 is the mean when x=0 • The mean increases by b1 for each increase of x by 1

  6. Simple Linear Regression Model • The equation that describes how individual y values relate to x and an error term is called the regression model. y = b0 + b1x +e • e reflects how individuals deviate from others with the same value of x

  7. Estimated Simple Linear Regression Equation • The estimated simple linear regression equation is: • b0 is the estimate for b0 • b1 is the estimate for b1 • is the estimated (predicted) value of y for a given x value. It is the estimated mean for that x.

  8. Least Squares Method • Least Squares Criterion: Choose b0and b1to minimize Of all possible lines pick the one that minimizes the sum of the distances squared of each point from that line S = S (yi – b0- b1xi)2

  9. The Least Squares Estimates Slope: Intercept:

  10. Estimating the Variance • An Estimate of s2 The mean square error (MSE) provides the estimate of s2, and the notation s2 is also used. s2 = MSE = SSE/(n-2) where: If points are close to the regression line then SSE will be small If points are far from the regression line then SSE will be large

  11. Estimating s • An Estimate of s • To estimate s we take the square root of s 2. • The resulting s is called the root mean square error .

  12. Hypothesis Testing for b1 • Ho: b1 = 0 no relation between x and y • Ha: b1≠0 relation between x and y • Test Statistic: t = b1/SE(b1) • SE(b1) depends on • Sample size • How well the estimated line fits the points • How spread out the range of x values are

  13. Testing for Significance: t Test • Rejection Rule Reject H0 if t < -tor t > t where: tis based on a t distribution with n - 2 degrees of freedom

  14. Confidence Interval for 1 is cutoff value from t-distribution with n-2 df CLM option in SAS on model statement

  15. Estimating the Mean for a Particular X • Simply plug in your value of x in the estimated regression equation Want to estimate the mean BP for persons aged 50 Suppose b0 = 100 and b1 = 0.80 Estimate = 100 + 0.80*50 = 140 mmHg • Can compute 95% CI for the estimate using SAS CLM option on model statement

  16. ^ ^ The Coefficient of Determination • Relationship Among SST, SSR, SSE SST = SSR + SSE where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

  17. The Coefficient of Determination • The coefficient of determination is: r2 = SSR/SST where: SST = total sum of squares SSR = sum of squares due to regression r 2 = proportion of variability explained by X (must be between 0 and 1)

  18. Residuals • How far off (distance) an individual point is from the estimated regression line residual = predicted value – observed value

  19. SAS CODE FOR REGRESSION; PROCREG DATA=datasetname SIMPLE; MODELdepvar = indvar(s); PLOTdepvar * indvar ; RUN; Several options on model and plot statements.

  20. OPTIONS ON MODEL STATEMENT; MODELdepvar = indvar(s)/options Option What it does clb 95% CI for b1 p Predicted values r Residuals clm 95% CI for the mean at value of x

  21. OUTPUT FROM PROC REG Dependent Variable: quarsales Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 SSR 14200 14200 74.25 <.0001 Error 8 SSE 1530 191.25000 Corrected Total 9 SST 15730 Root MSE 13.82932 R-Square 0.9027 Dependent Mean 130.00000 Coeff Var 10.63794 MSE Coefficient of Determination 14200/15730

  22. Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 60.00000 9.22603 6.50 0.0002 studentpop 1 5.00000 0.58027 8.62 <.0001 b1 SE(b1) REGRESSION EQUATION: Y = 60.0 + 5.0*X QUARSALES = 60 + 5*STUDENTPOP

More Related