240 likes | 256 Views
Learn how to analyze the association between two variables and estimate the impact and prediction using linear regression.
E N D
This Week • Continue with linear regression • Begin multiple regression • Le 8.2 • C & S 9:A-E • Handout: Class examples and assignment 3
Linear Regression • Investigate the relationship between two variables • Dependent variable • The variable that is being predicted or explained • Independent variable • The variable that is doing the predicting or explaining • Think of data in pairs (xi, yi)
Linear Regression - Purpose • Is there an association between the two variables • Is BP change related to weight change? • Estimation of impact • How much BP change occurs per pound of weight change • Prediction • If a person loses 10 pounds how much of a drop in blood pressure can be expected
Assumption for Linear Regression • For each value of X there is a population of Y’s that are normally distributed • The population means form a straight line • Each population has the same variance s2 • Note: The X’s do not need to be normally distributed, in fact the researcher can select these prior to data collection
Simple Linear Regression Equation • The simple linear regression equation is: m y = 0 + 1x • b0 is the mean when x=0 • The mean increases by b1 for each increase of x by 1
Simple Linear Regression Model • The equation that describes how individual y values relate to x and an error term is called the regression model. y = b0 + b1x +e • e reflects how individuals deviate from others with the same value of x
Estimated Simple Linear Regression Equation • The estimated simple linear regression equation is: • b0 is the estimate for b0 • b1 is the estimate for b1 • is the estimated (predicted) value of y for a given x value. It is the estimated mean for that x.
Least Squares Method • Least Squares Criterion: Choose b0and b1to minimize Of all possible lines pick the one that minimizes the sum of the distances squared of each point from that line S = S (yi – b0- b1xi)2
The Least Squares Estimates Slope: Intercept:
Estimating the Variance • An Estimate of s2 The mean square error (MSE) provides the estimate of s2, and the notation s2 is also used. s2 = MSE = SSE/(n-2) where: If points are close to the regression line then SSE will be small If points are far from the regression line then SSE will be large
Estimating s • An Estimate of s • To estimate s we take the square root of s 2. • The resulting s is called the root mean square error .
Hypothesis Testing for b1 • Ho: b1 = 0 no relation between x and y • Ha: b1≠0 relation between x and y • Test Statistic: t = b1/SE(b1) • SE(b1) depends on • Sample size • How well the estimated line fits the points • How spread out the range of x values are
Testing for Significance: t Test • Rejection Rule Reject H0 if t < -tor t > t where: tis based on a t distribution with n - 2 degrees of freedom
Confidence Interval for 1 is cutoff value from t-distribution with n-2 df CLM option in SAS on model statement
Estimating the Mean for a Particular X • Simply plug in your value of x in the estimated regression equation Want to estimate the mean BP for persons aged 50 Suppose b0 = 100 and b1 = 0.80 Estimate = 100 + 0.80*50 = 140 mmHg • Can compute 95% CI for the estimate using SAS CLM option on model statement
^ ^ The Coefficient of Determination • Relationship Among SST, SSR, SSE SST = SSR + SSE where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error
The Coefficient of Determination • The coefficient of determination is: r2 = SSR/SST where: SST = total sum of squares SSR = sum of squares due to regression r 2 = proportion of variability explained by X (must be between 0 and 1)
Residuals • How far off (distance) an individual point is from the estimated regression line residual = predicted value – observed value
SAS CODE FOR REGRESSION; PROCREG DATA=datasetname SIMPLE; MODELdepvar = indvar(s); PLOTdepvar * indvar ; RUN; Several options on model and plot statements.
OPTIONS ON MODEL STATEMENT; MODELdepvar = indvar(s)/options Option What it does clb 95% CI for b1 p Predicted values r Residuals clm 95% CI for the mean at value of x
OUTPUT FROM PROC REG Dependent Variable: quarsales Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 SSR 14200 14200 74.25 <.0001 Error 8 SSE 1530 191.25000 Corrected Total 9 SST 15730 Root MSE 13.82932 R-Square 0.9027 Dependent Mean 130.00000 Coeff Var 10.63794 MSE Coefficient of Determination 14200/15730
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 60.00000 9.22603 6.50 0.0002 studentpop 1 5.00000 0.58027 8.62 <.0001 b1 SE(b1) REGRESSION EQUATION: Y = 60.0 + 5.0*X QUARSALES = 60 + 5*STUDENTPOP