1 / 42

Correlation and L inear Regression

Correlation and L inear Regression. By Arman Banimahd. From The Basic Practice of Statistics by David S. Moore. Response and Explanatory Variables. A Response Variable measures an outcome of a study. Dependent Variables

enye
Download Presentation

Correlation and L inear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CorrelationandLinear Regression By Arman Banimahd From The Basic Practice of Statistics by David S. Moore

  2. Response and Explanatory Variables A Response Variable measures an outcome of a study. • Dependent Variables An Explanatory Variable may explain or influence changes in response variable. • Independent variables From The Basic Practice of Statistics by David S. Moore

  3. Demonstration Suppose we have data on a large group of college students. Find the response variables and the explanatory variables. • Amount of time spent studying for a statistics exam and grade on the exam. • Weight in kilograms and height in centimeters. • Hours per week of extracurricular activities and grade point average. • Score on the SAT Mathematics exam and score on the SAT Critical Reading exam. Suppose we have data on a large group of college students. Find the response variables and the explanatory variables. • Amount of time spent studying for a statistics exam and grade on the exam. • Weight in kilograms and height in centimeters. • Hours per week of extracurricular activities and grade point average. • Score on the SAT Mathematics exam and score on the SAT Critical Reading exam. From The Basic Practice of Statistics by David S. Moore

  4. Scatterplot A scatterplotshows the relationship between two quantitative variables measured on the same individuals. Note: Always plot the explanatory variable on the horizontal axis of the scatterplot, if there is one. From The Basic Practice of Statistics by David S. Moore

  5. Examining a Scatterplot In a graph, Look for: • Overall Pattern • Deviations • Direction • Form • Strength AnOutlier is an individual value that falls outside the overall pattern of the relationship. From The Basic Practice of Statistics by David S. Moore

  6. Positive and Negative Associations Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increases. Ex. The relationship between your age and your father’s age Two variables have a negative association when the values of one variable tend to decrease as the values of the other variable increase. Ex. The relationship between the amount of gas left in your car’s tank and the number of miles you travel From The Basic Practice of Statistics by David S. Moore

  7. Correlation The correlation measures the direction and strength of the linear relationship between two quantitative variables. (usually written as ) Formula: or more compactly, From The Basic Practice of Statistics by David S. Moore

  8. Correlation Note: Correlation is always between -1 and 1. Positive Correlation From The Basic Practice of Statistics by David S. Moore

  9. Correlation Note: Correlation is always between -1 and 1. Negative Correlation From The Basic Practice of Statistics by David S. Moore

  10. Correlation Note: Correlation is always between -1 and 1. No Correlation From The Basic Practice of Statistics by David S. Moore

  11. Regression Lines A regression line is a straight line that describes how a response variable changes as an explanatory variable changes. Note: We often use a regression line to predict the value of for a given value of . From The Basic Practice of Statistics by David S. Moore

  12. Review of Straight Lines Suppose that is a response variable (plotted on the vertical axis).A straight line relating to has an equation of the form: where is the slope, and is the -intercept. Note: • slope is the amount by which changes when increases by one unit. • -intercept is the value of when . From The Basic Practice of Statistics by David S. Moore

  13. Review of Straight Lines Point 1: Point 2: Slope: Y-intercept: 2 Equation of the line: 4 3 2 1 1 2 3 4 From The Basic Practice of Statistics by David S. Moore

  14. Exercise 1 We expect a car’s highway gas mileage to be related to its city gas mileage. Data for all 1198 vehicles in the government’s 2008 Fuel Economy Guide give the regression line for predicting highway mileage from city mileage. • What is the slope of this line? Say in words what the numerical value of the slope tells you. • Slope = 1.109 • It tells us that highway mpg goes up by 1.109 mpg for each added city mpg. From The Basic Practice of Statistics by David S. Moore

  15. Exercise 1 We expect a car’s highway gas mileage to be related to its city gas mileage. Data for all 1198 vehicles in the government’s 2008 Fuel Economy Guide give the regression line for predicting highway mileage from city mileage. • What is the intercept? Explain why the value of the intercept is not statistically meaningful. • Intercept = 4.62 • When city mpg is 0, we expect the highway mpg to be 0 as well, but the intercept shows otherwise. From The Basic Practice of Statistics by David S. Moore

  16. Exercise 1 We expect a car’s highway gas mileage to be related to its city gas mileage. Data for all 1198 vehicles in the government’s 2008 Fuel Economy Guide give the regression line for predicting highway mileage from city mileage. • Find the predicted highway mileage for a car that gets 16 miles per gallon on the city. From The Basic Practice of Statistics by David S. Moore

  17. Exercise 2 You use the same bar of soup to shower each morning. The bar weights 80 grams when it is new. Its weight goes down by 6 grams per day on the average. What is the equation of the regression line for predicting weight from days of use? where represents the weight from days of use, and represents the number of days used. From The Basic Practice of Statistics by David S. Moore

  18. Least-Squares Regression Line The least-squares regression line of on is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. From The Basic Practice of Statistics by David S. Moore

  19. Equation of Least-Squares Regression Line Suppose that is a response variable and is the explanatory variable. Then the least-square regression line is the line: where slope is: and the -intercept is: From The Basic Practice of Statistics by David S. Moore

  20. Some Review Formulas , , , From The Basic Practice of Statistics by David S. Moore

  21. Demonstration An outbreak of the deadly Ebola virus in 2002 and 2003 killed 91 of the 95 gorillas in 7 home ranges in the Congo. To study the spread if the virus, measure “distance” by the number of home ranges separating a group of gorillas from the first group infected. Here are data on distance and number of days until deaths began in each later group. From The Basic Practice of Statistics by David S. Moore

  22. Demonstration Solution on Excel From The Basic Practice of Statistics by David S. Moore

  23. Facts About Least-Squares Regression • The distinction between explanatory and response variables is essential in regression. • The Slope of the least-squares line and the correlation always have the same sign. (A change of one standard deviation in corresponds to a change of standard deviations in ). • The least-squares regression line always passes through the point on the graph of against . • The square of the correlation, , is the fraction of the variation in the values of that is explained by the least-squares regression of on . From The Basic Practice of Statistics by David S. Moore

  24. Residual A residual is the difference between an observed value of the response variable and the value predicted by the regression line. Note: • The mean of the least-squares residuals is always zero. • A residual plot is a scatterplot of the regression residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data. From The Basic Practice of Statistics by David S. Moore

  25. Exercise 3 Do heavier people burn more energy? We will use these data to illustrate influence. • Make a scatterplot of the data that is suitable for predicting metabolic rate from body mass, with two new points added. Point A: mass 42 kilograms, metabolic rate 1500 calories. Point B: mass 70 kilograms, metabolic rate 1400 calories. From The Basic Practice of Statistics by David S. Moore

  26. Exercise 3 From The Basic Practice of Statistics by David S. Moore

  27. Exercise 3 Do heavier people burn more energy? We will use these data to illustrate influence. • Add three least-squares regression line to your plot: for the original 12 women, for the original women plus Point A, and for the original women plus Point B. Which new point is more influential for the regression line? Explain in simple language why each new point moves the line in the way your graph shows. From The Basic Practice of Statistics by David S. Moore

  28. Exercise 3 From The Basic Practice of Statistics by David S. Moore

  29. Exercise 3 From The Basic Practice of Statistics by David S. Moore

  30. Exercise 3 From The Basic Practice of Statistics by David S. Moore

  31. Influential Observation An observation is influentialfor a statistical calculation if removing it would noticeably change the result of the calculation. Ex. Points that are outliers in either the or direction of a scatterplot are often influential for the correlation. From The Basic Practice of Statistics by David S. Moore

  32. Cautions About Correlation & Regression • Correlation and regression lines describe only linear relationships. You can do the calculations for any relationship between two quantitative variables, but the results are useful only if the scatterplot shows a linear pattern. • Correlation and least-squares regression lines are not resistant. Always plot your data and look for observations that may be influential. From The Basic Practice of Statistics by David S. Moore

  33. Extrapolation Extrapolationis the use of a regression line for prediction far outside the range of values of the explanatory variable that you use to obtain the line. Such predictions are often not accurate. Ex. Predicting height of a 25 year-old person based on a regression line of a set of data on a child’s growth between 3 and 8 years of age. (predicted height is 8 feet.) From The Basic Practice of Statistics by David S. Moore

  34. Remarks Beware of Lurking Variable A lurking variable is a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables. Association Does Not Imply Causation: An association between an explanatory variable and a response variable , even if it is very strong, is not by itself good evidence that changes in actually cause changes in . From The Basic Practice of Statistics by David S. Moore

  35. Exercise 4 How strongly do physical characteristics of sisters and brothers correlate? Here are data on the heights (in inches) of 11 adult pairs: • Find the correlation and the equation of the least-squares line for predicting sister’s height from brother’s height. Make a scatterplot of the data and add the regression line to your plot. From The Basic Practice of Statistics by David S. Moore

  36. Exercise 4 From The Basic Practice of Statistics by David S. Moore

  37. Exercise 4 From The Basic Practice of Statistics by David S. Moore

  38. Exercise 4 From The Basic Practice of Statistics by David S. Moore

  39. Exercise 4 How strongly do physical characteristics of sisters and brothers correlate? Here are data on the heights (in inches) of 11 adult pairs: • Adam is 70 inches tall. Predict the height of his sister Kim. inches From The Basic Practice of Statistics by David S. Moore

  40. Questions? From The Basic Practice of Statistics by David S. Moore

  41. Summary • Correlation • Regression Line • Least-squares Regression Line • High Correlation Does NOT Imply Causation • Beware of Lurking Variables and Avoid Extrapolation From The Basic Practice of Statistics by David S. Moore

  42. Exercise 5 (Attendance Quiz) Because elderly people may have difficulty standing to have their heights measured, a study looked at predicting overall height from height to the knee. Here are the data (in centimeters) for five elderly men. What is the equation of the least-squares regression line for predicting height from knee height? From The Basic Practice of Statistics by David S. Moore

More Related