1 / 13

Decomposition of Sum of Squares

Decomposition of Sum of Squares. The total sum of squares (SS) in the response variable is The total SS can be decompose into two main sources; error SS and regression SS… The error SS is The regression SS is

dions
Download Presentation

Decomposition of Sum of Squares

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decomposition of Sum of Squares • The total sum of squares (SS) in the response variable is • The total SS can be decompose into two main sources; error SS and regression SS… • The error SS is • The regression SS is It is the amount of variation in Y’s that is explained by the linear relationship of Y with X. STA302/1001 - week 4

  2. Claims • First, SSTO = SSR +SSE, that is • Proof:…. • Alternative decomposition is • Proof: Exercises. STA302/1001 - week 4

  3. Analysis of Variance Table • The decomposition of SS discussed above is usually summarized in analysis of variance table (ANOVA) as follow: • Note that the MSE is s2 our estimate of σ2. STA302/1001 - week 4

  4. Coefficient of Determination • The coefficient of determination is • It must satisfy 0 ≤R2≤ 1. • R2 gives the percentage of variation in Y’s that is explained by the regression line. STA302/1001 - week 4

  5. Claim • R2 = r2, that is the coefficient of determination is the correlation coefficient square. • Proof:… STA302/1001 - week 4

  6. Important Comments about R2 • It is a useful measure but… • There is no absolute rule about how big it should be. • It is not resistant to outliers. • It is not meaningful for models with no intercepts. • It is not useful for comparing models unless same Y and one set of predictors is a subset of the other. STA302/1001 - week 4

  7. ANOVE F Test • The ANOVA table gives us another test of H0: β1 = 0. • The test statistics is • Derivations … STA302/1001 - week 4

  8. Prediction of Mean Response • Very often, we would want to use the estimated regression line to make prediction about the mean of the response for a particular X value (assumed to be fixed). • We know that the least square line is an estimate of • Now, we can pick a point, X = x* (in the range in the regression line) then, is an estimate of • Claim: • Proof: • This is the variance of the estimate of E(Y | X=x*). STA302/1001 - week 4

  9. Confidence Interval for E(Y | X = x*) • For a given x, x* , a 100(1-α)% CI for the mean value of Y is where STA302/1001 - week 4

  10. Example • Consider the smoking and cancer data. • Suppose we wish to predict the mean mortality index when the smoking index is 101, that is, when x* = 101…. STA302/1001 - week 4

  11. Prediction of New Observation • Suppose we want to predict a particular value of Y* when X = x*. • The predicted value of a new point measured when X = x* is • Note, the above predicted value is the same as the estimate of E(Y | X = x*). • The predicted value has two sources of variability. One is due to the regression line being estimated by b0+b1X. The second one is due to ε* i.e.,points don’t fall exactly on line. • To calculated the variance in error of prediction we look at the difference STA302/1001 - week 4

  12. Prediction Interval for New Observation • 100(1-α)% prediction interval for when X = x* is • This is not a confidence interval; CI’s are for parameters and we are estimating a value of a random variable. • Prediction interval is wider than CI for E(Y | X = x*). STA302/1001 - week 4

  13. Dummy Variable Regression • Dummy or indicator variable takes two values: 0 or 1. • It indicates which category an observation is in. • Example… • Interpretation of regression coefficient in a dummy variable regression… STA302/1001 - week 4

More Related