1 / 24

Simple Regression I

Simple Regression I. Regression Analysis. Correlation tells us how strongly Y and X are related … but regression estimates the form of this relationship We’ll begin with simple regression, which assumes the form:. Regression Notation. Y is the variable we want to predict

Download Presentation

Simple Regression I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simple Regression I Simple Regression

  2. Regression Analysis • Correlation tells us how strongly Y and X are related … but regression estimates the form of this relationship • We’ll begin with simple regression, which assumes the form: Simple Regression

  3. Regression Notation • Y is the variable we want to predict • We believe X influenceshowYbehaves • Ŷi is the estimated valueof Y at Xi • b0is the Y-intercept in the equation • b1 is the slope of the regression line Simple Regression

  4. Fitting the Regression Line • Our goal: Find the straight line that best fits the data we’ve collected • The best equation will be the one that minimizes the error in fit • The equation is: • The fit error is thus: Simple Regression

  5. Obtaining the line + Errors - Errors Simple Regression

  6. Balancing out the errors • The fit error for the ith point on the scatterplot diagram is: • We would like the sum of the + errors to be the same as the sum of the – errors. • However, there are many lines that can make this happen. Simple Regression

  7. Zero Error Lines Simple Regression

  8. The “Least Squares” Line • So, which of these solutions is the best one? • Select the line with the minimum sum of squared error terms. This is called least-squares regression. Simple Regression

  9. The Least Squares Estimators • Intercept: • Slope:* note COVAR here is Excel’s functional calculation which is the population covariance not the sample covariance Simple Regression

  10. Getting the Estimates in Excel • Some values can be calculated directly using the means, variances, and covariances. • For one-variable (simple) regression, can add a trendline to a chart. • Can use the Data Analysis Tool, Regression • Can use the Excel function LINEST. Simple Regression

  11. Regression with mail data Uses Excel’s Trend Line function Simple Regression

  12. Output from Data Analysis Tool Simple Regression

  13. Output from LINEST The LINEST function must be entered as an array formula. For the example, highlight the cells E3:F7, type the formula “=LINEST(Orders,Weight,1,1)”, then CTRL-SHFT-ENTER. Simple Regression

  14. Interpretation of Results • Remember the variables are X = weight in pounds and Y = orders in 1000s • The estimated intercept (b0) tells us that if there was no mail, we still have a minimum of (.1912)(1000) or 191.2 orders per day. • The estimated slope (b1) tells us that each pound of mail tends to bring with it (.0297)(1000) or 29.7 orders. Simple Regression

  15. How Good Is Our New Model? There are two standard ways to judge: • How much of the variation in the Y values (orders) can be attributed to the different values of X (weight of mail)? • In general, how small (or large) are the errors in fit? Simple Regression

  16. R2– A Universal Measure of Fit • The Coefficient of Determination: • The R2 value is: • Always between 0 and 1 • Is the percentage of variation explained by the model. • The square of correlation (for simple regression) Simple Regression

  17. How is R2 computed? • ANOVA table: Total variation in the Y values is SST = 449.76 • The amount of unexplained variation isSSE = 12.12 • The difference is thus the variation explained by the regression equation orSSR = 449.76 – 12.12 = 437.64 • The ratio of explained to total is how we get R2 = 437.64/449.76 = .973 Simple Regression

  18. Size of the Typical Error (S) • For every observation i, its error is given by: • To find the “typical error,” use this formula: • This is the “Standard Error”, also the √MSE. Simple Regression

  19. Sin our example • The typical error (called the standard error of prediction) for our regression model is: S = .7258 • This means that we typically misestimate the actual number of orders per day by (.7258)(1000) = 725.8 • That may sound like a lot, but you have to consider that we have between 5 and 20 thousand orders each day, average (13.22)*(1000) = 13200, then the percentage error is only 725.8 / 13200 = 5.5%. Simple Regression

  20. Sales Data Simple Regression

  21. Sales Data Manual Simple Regression

  22. Sales Data Graphical Simple Regression

  23. Sales Data Tools Simple Regression

  24. Sales Data LINEST Simple Regression

More Related