190 likes | 287 Views
Simple Linear Regression. Start by exploring the data. Construct a scatterplot Does a linear relationship between variables exist? Is the relationship strong? How much variation can be explained by a linear relationship with the independent or explanatory variable?. Beers and BAC.
E N D
Start by exploring the data • Construct a scatterplot • Does a linear relationship between variables exist? • Is the relationship strong? • How much variation can be explained by a linear relationship with the independent or explanatory variable?
Variance “Candy Bar” Explained Unexplained • The R-sq value: estimates the percentage of variation explained by a linear relationship with the independent or explanatory variable. Unless this estimate is 100% (or very near), it is not sufficient on its own. • The amounts of explained and unexplained information due to the model are measured by Sums of Squares
Decomposition of information into explained and unexplained parts
Residuals • A residual is the difference between an observed value of the dependent variable and the value predicted by the regression line. • Residual = (observed y) - (predicted y)= y – ŷ They help us assess the fit of a regression line.
Variance “Candy Bar” Explained Unexplained SS explained by model SS Total SS Error Systematic SS + Random SS = Total SS
Model Assumptions about the residuals (ε) • The distribution is NORMAL • The mean is ZERO • The variance is CONSTANT for all values of x (σ2) • Errors associated with any two observations are independent
Assessing the utility of the model: model variance • Variance is variability of the random error (σ2) • The higher the variability of the random error, the greater the error of prediction • σ2 is estimated with s2 (often called the mean square for error, MSE) • Variance: s2= SSE/degrees of freedom (n-2) • Standard error: • This is like standard deviation; with standard error, we are looking at deviation from the line • Approximately 95% of observed y values will lie within 2s of their respective predicted values
Assessing the utility of the model: Slope • Does y change as x changes? Does x contribute information for the prediction of y? Test this with the t-statistic or p-value (p<.05); these values are included in software output
Assessing the utility of the model: Correlation Coefficient r • Measure of the strength and direction of the linear relationship between x and y • Always between -1 and +1 • High correlation does not imply causality
Assessing the utility of the model: Coefficient of Determination (r2) • The R squared value is the % of the variation in y explained by the model. • For linear regression, the higher the value, the better the model.
Using the model for estimation and prediction: Confidence interval for mean response • For any specific value of x: • A confidence interval for adds to this estimate a margin of error based on the standard error . • Confidence intervals widen as the value of x is further from its mean.
Prediction interval for a future observation • Similar to confidence interval for mean response • Standard error used in prediction interval includes • Variability due to the fact that the least-squares line is not exactly equal to the true regression line • Variability of the future response variable y around the subpopulation mean.
In the MINITAB regression window, you might want to… • Set confidence levels in Options • Enter a value for prediction in Options • Store Residuals and Fits in Storage • Display full table of fits and residuals in Results (select last bullet)
Beware of Extrapolation • Extrapolation is the use of a regression line for prediction far outside the range of values of the independent variable x that you used to obtain the line. Such predictions are not accurate.
Example from book: p. 138 • How can we tell if it is reasonable to fit a linear regression model? • Let’s run the analysis and interpret the results