100 likes | 196 Views
Ch. 5 Regression Review. Co rrelation. Symbol: r Called P earson’s Correlation Coefficient Measure of association used only in LINEAR situations Sign of r is same as sign of slope Strength of correlation: | r |<0.5 weak correlation 0.5<| r |<0.8 moderate
E N D
Correlation • Symbol: r • Called Pearson’s Correlation Coefficient • Measure of association used only in LINEAR situations • Sign of r is same as sign of slope • Strength of correlation: • |r|<0.5 weak correlation • 0.5<|r|<0.8 moderate • 0.8<|r|<1.0 strong • r = 1 or r = -1 indicates ____________ • Correlation does not imply causation • Could be a 3rd extraneous variable that is affecting both
Regression • Formulas for slope and y-intercept • Interpreting • Slope = predicted change in y for every 1 unit increase in x • Y-intercept = predicted value of y when x = 0; often useless • Symbols – yhat vs. y • Residual: y – yhat • Always sum to be 0
Determining Model Fit • Residual plot – want no pattern • Nonlinear pattern in residual plot indicates data may be nonlinear • No fan • No other pattern • Coefficient of determination – r2 - % of variability in y that can be explained by the approximately linear relationship between x and y (use CONTEXT) • Standard deviation about LSRL (se) = typical amount by which an observation deviates in y direction from least squares regression line
Extrapolation • Using model to predict y for x value outside range used to create LSRL • Prediction could be accurate or inaccurate
Regression Analysis: Profit versus Price, Price^2 The regression equation is Profit = - 2701 + 7060 Price - 2851 Price^2 Predictor Coef SE Coef T P Constant -2700.6 346.5 -7.79 0.000 Price 7060.1 474.9 14.87 0.000 Price^2 -2851.3 157.7 -18.08 0.000 S = 83.4862 R-Sq = 97.6% R-Sq(adj) = 97.4% Analysis of Variance Source DF SS MS F P Regression 2 10869753 5434877 779.76 0.000 Residual Error 39 271828 6970 Total 41 11141581 Source DF Seq SS Price 1 8590119 Price^2 1 2279634 Unusual Observations Obs Price Profit Fit SE Fit Residual St Resid 8 1.15 1463.4 1647.7 20.3 -184.3 -2.28R 15 1.35 1802.6 1634.1 18.0 168.6 2.07R R denotes an observation with a large standardized residual.
Process: 1. Scatterplot of (x,y). 2. Transform x, y, or both (if needed) using the ladder of powers. 3. Scatterplot of transformed data. 4. Least Squares Regression. 5. Residual Plot. 6. Acceptable? Yes: go to 7 No: go to 2 7. Solve for y. 8. Plot y= and (x,y).
A Quick Note • In any linear regression model, you may only input a value for x in the equation • If you are asked to predict x based off of y, you have to do the regression again