1 / 20

Basic Statistics

Basic Statistics. Linear Regression. Simple Linear Regression. Y. X. Predicting Y from X. Recall when we looked at scatter plots in our discussion of correlation, we showed generally the estimate of Y given a value for X, when the correlation was not perfect.

Download Presentation

Basic Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Statistics Linear Regression

  2. Simple Linear Regression Y X

  3. Predicting Y from X • Recall when we looked at scatter plots in our discussion of correlation, we showed generally the estimate of Y given a value for X, when the correlation was not perfect. • We will now look at how to use our knowledge of the correlation to predict a value for Y, when we know a value for X.

  4. Scatter Plot of Y and X high Variable Y Estimated Y value low low high Variable X The GREEN line shows our prediction or regression line.

  5. Prediction Equation • The green line in the previous slide showed us our prediction line. • We will use the mathematical formula for a straight line as the method for predicting a value for Y when we know the value for X. • The process is called “Linear Regression” because, in this class, we will only deal with relationships that can be fitted by a straight line. • The general formula for a straight line is:

  6. The Prediction Equation • ay = the intercept or where the prediction line crosses the Y-axis (the value of Y when X = 0) • by = the regression coefficient that indicates the amount of change in Y when the value of X increases one unit.

  7. A Simple Example • Suppose that a club charges a flat $25 to use their facilities. • They also charge a $10 fee per hour for using the tennis courts. • Now, assume that you want to play tennis for 2 hours at this club. How much would you have to pay? Ŷ= $25 + (2) $10 = $25 + $20 = $45 for two hours of tennis

  8. Linking the Simple Example to Regression Ŷ= $25 + (2) $10 = $25 + $20 = $45 for two hours of tennis • In our example: • $25 is ay, the intercept. Even if we didn’t play any tennis (X = 0), it would cost $25 to use the club. • $10 is by, the regression coefficient (it costs $10 for each hour of tennis played) • In this case we predicted how much it would cost (Y) when we knew how long we wanted to play tennis.

  9. Formulae for Sums of Squares These were introduced in our discussion of correlation.

  10. Calculating the Regression Coefficient (b) or

  11. Calculating the Intercept (a) You will notice that you must calculate the regression coefficient (b) before you can calculate the intercept (a), since the calculation of a uses b.

  12. An Example • From our earlier example, suppose that our college statistics professor is interested in predicting how many errors students might make on the mid-term examination based on how many hours they studied. Specifically, the professor wants to know how many errors a student might make if the student studied for 5 hours.

  13. The Stats Professor’s Data

  14. The Resulting Sum of Squares = 546 - 702/10 = 546 - 490 = 56 = 695 - 732/10 = 695 - 523.9 = 162.1 = 429 – (70)(73)/10 = 429 – 511 = -82

  15. Calculating the Regression Coefficient (b) = - 82 / 56 = - 1.46 This can be interpreted as the change in the value of Y (in our case, errors made on the mid-term), for a unit change in X, or for us, each additional hour studied! Thus, study for another hour and make 1.46 fewer mistakes (on average!).

  16. Calculating the Intercept (a) = 7.3 – (-1.46)(7) = 7.3 + 10.25 = 17.55 Therefore, our prediction equation is Ŷ = 17.55 + (-1.46) (X)

  17. Using Our Prediction Equation Ŷ = 17.55 + (-1.46) (X) If the professor wanted to predict the number of errors a student might make if the student had studied for 5 hours, then we would substitute 5 for X in the above equation and obtain: Ŷ = 17.55 + (-1.46) (5) = 17.55 + (-7.3) = 10.25 Thus, the professor would predict 10.25 errors for a student who had studied for 5 hours.

  18. Measuring Prediction Errors:The Standard Error of the Estimate Since we know that the estimate is not exact, as statisticians, we must report how much error we feel is in our estimate. The formula is: OR

  19. Calculating the Standard Error of the Estimate = 1 - .74(162.1) / 8 = 2.29 Thus, when we estimated 10.25 errors, we also would report that the Standard Error of the Estimate is 2.29.

  20. Summarizing Prediction Equations • The existence of a relationship between two variables allows us to use that knowledge to make predictions. • The prediction based on our equation will result in less error in prediction than using the mean of the dependent variable. • Two sums of squares are required to calculate the regression coefficient and the intercept.

More Related