1 / 17

Chapter 10, Part 2

Linear Regression. Chapter 10, Part 2. Predictions with Scatterplots. Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable is explanatory , and the other is the response .

Download Presentation

Chapter 10, Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression Chapter 10, Part 2

  2. Predictions with Scatterplots • Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. • One variable is explanatory, and the other is the response. • Today:If we know the value of the explanatory variable, can we predict the value of the response variable?

  3. The Regression Line • To make predictions, we’ll find a straight line that is the “best fit” for the points in the scatterplot. This is not so simple….

  4. Regression Line in JMP • Start by making a scatterplot. • Red Triangle menu -> “Fit Line.” • The equation of the regression line appears under the “Linear Fit” group. • JMP uses column headings as variable names (instead of x and y). • Example from the Cars 1993 file: • MaxPrice = 2.3139014 + 1.1435971*MinPrice

  5. Predicted Values • We use the equation of the regression line to make predictions about… • Individuals not in the original data set. • Later measurements of the same individuals. • Example: In 1994, a vehicle had a Min. Price of $15,000. Use the previous data to predict the Max. Price. • You can do this by hand from the equation: MaxPrice = 2.3139014 + 1.1435971*MinPrice • 2.3139014+1.1435971*(15) = 19.4678579

  6. Are the Predictions Useful? • In some cases, the regression line is more useful for predicting values. Consider the following examples (from Cars 1993):

  7. Coefficient of Determination • If the scatterplot is well-approximated by a straight line, the regression equation is more useful for making predictions. • Correlation is one measure of this. • The square of the correlation has a more intuitive meaning: What proportion of variation in the Response Variable is explained by variation in the Explanatory Variable? • JMP: “RSquare” under “Summary of Fit”

  8. Coefficient of Determination • In predicting Max. Price from Min. Price, we had RSquare = 0.822202. • About 82% of the variation in Max. Price is explained by a variation in Min. Price. • In predicting Highway MPG from Engine size, we have RSquare = 0.392871 • Only 39% of the variation in Highway MPG is explained by a variation in Engine Size.

  9. Coefficient of Determination • RSquare takes values from 0 to 1. • For values close to 0, the regression line is not very useful for predictions. • For values close to 1, the regression line is more useful for making predictions. • RSquare makes no distinction between positive and negative association of variables.

  10. Residuals • For each individual in the data set we can compute the difference (error) between the actual and predicted values of the response variable. This difference is called a residual: Residual = (actual value) – (predicted value) • In JMP: Click the red triangle by “Linear Fit” and select “Save Residuals” from the drop-down menu. You can also “Plot Residuals.”

  11. How does JMP find the Regression Line? • JMP uses the most popular method, Ordinary Least Squares (OLS). • To measure how a given line fits the data: • Compute all residuals, take the square of each. • Add up the results to get a “total error.” • The closer this total is to zero, the better the line fits the data. Choose the line with the smallest “total error.” • (Thankfully) JMP takes care of the details.

  12. Limitations of Correlation and Linear Regression: • Both describe linear relationships only. • Both are sensitive to outliers. • Beware of extrapolation: predicting outside of the given range of the explanatory variable. • Beware of lurking variables: other factors that may explain a strong correlation. • Correlation does not imply causality!

  13. Beware Extrapolation! • A child’s height was plotted against her age... • Can you predict her height at age 8 (96 months)? • Can you predict her height at age 30 (360 months)?

  14. Beware Extrapolation! • Regression line:y = 71.95 + .383 x • Height at 96 months? y = 94.93cm (3' 6'') • Height at 360 months? y = 209.8cm (6’ 10'') • Height at birth (x = 0)? • y = 71.95cm (2’ 4”)

  15. Beware Lurking Variables! • Although there may be a strong correlation (statistical relationship) between two variables, there might not be a direct practical (cause-and-effect) relationship. • A lurking variable is a third variable (not in the scatterplot) that might cause the apparent relationship between explanatory and response variables.

  16. Example: Pizza vs. Subway Fare • The regression line to the right shows a strong correlation (0.9878) between the cost of: • A slice of pizza • Subway fare • Q: Does the price of pizza affect the price of the subway?

  17. Caution:Correlation Does Not Imply Causation • In a study of emergency services, it was noted that larger fires tend to have more firefighters present. • Suppose we used: • Explanatory Variable: Number of firefighters • Response Variable: Size of the fire • We would expect a strong correlation. • But it’s ludicrous to conclude that having more firefighters present causes the fire to be larger.

More Related