370 likes | 577 Views
Regression Analysis Week 9. EXTRA SUMS OF SQUARES Extra sum squares Decomposition of SSR Usage of Extra sum of Squares Coefficient of partial determination. Extra sum squares. A different way to look at the comparison of models .
E N D
Regression Analysis Week 9 EXTRA SUMS OF SQUARES • Extra sum squares • Decomposition of SSR • Usage of Extra sum of Squares • Coefficient of partial determination
Extra sum squares • A different way to look at the comparison of models. • It measures the marginal reduction in the error sum of squares when one or several independent variablea are added to the regression model, given that other indep variables are already in the model. • It shows the marginal increasing in the regression sum of squares squares when one or several independent variablea are added to the regression model.
Extra sum squares (2) • Look at the difference • in SSE • In SSM • Because SSM+SSE=SST, these two ways are equivalent • Models we compare are hierarchical in the sense that one includes all of the explanatory variables of the other
Extra sum squares (3) • We can compare models with different explanatory variables • X1, X2 vs X1 • X1, X2, X3, X4, X5 vs X1, X2, X3 • Note first includes all Xs of second • We will get an F test that compares the two models • We are testing a null hypothesis that the regression coefficients for the extra variables are all zero
Extra sum squares (4) • For X1, X2, X3, X4, X5 vs X1 , X2 , X3 • H0: β4 = β5 = 0 • H1: β4 and β5 are not both 0 • Degrees of freedom for the F statistic are the number of extra variables and the dfE for the model with larger number of explanatory variables • Suppose n=100 and we compare models with X1, X2, X3, X4, X5 vs X1 , X2 , X3 • Numerator df is 2 • Denominator df is n-6 = 94
Examples • Predict bone density using age, weight and height; does diet add any useful information? • Predict faculty salaries using highest degree (categorical), rank (categorical), time in rank, and department (categorical); does race (gender) add any useful information?
Examples (2) • Predict GPA using 3 HS grade variables; do SAT scores add any useful information? • Predict yield of an industrial process using temperature and pH; does the supplier of the raw material (categorical) add any useful information?
Decomposition of SSR Suppose we have the case of two X variables • The total sum of squares that only X1 is in the model given by: SST = SSR(X1) + SSE(X1) • When we add X2 in the model while X1 is already in the model, SST become: SST = SSR(X1) + SSR(X2|X1) + SSE(X2,X1) • Equivalent to: SST = SSR(X2,X1) + SSE(X2,X1)
Decomposition of SSR (2) Suppose we have the case of two X variables • The total sum of squares that only X1 is in the model given by: SST = SSR(X1) + SSE(X1) • When we add X2 in the model while X1 is already in the model, SST become: SST = SSR(X1) + SSR(X2|X1) + SSE(X2,X1) • Equivalent to: SST = SSR(X2,X1) + SSE(X2,X1)
Decomposition of SSR (3) Hence decomposition of the regression sum of squares SSR(X2,X1) into two marginal components are: • SSR (X1), measuring the contribution by including X1 alone in the model, and • SSR(X2|X1), measuring the additional contribution when X2 is included, given that X1 is already in the model.
Decomposition of SSR (4) Therefore, when the regression model contains three X variables, a variety decomposition of SSR(X1,X2,X3) can be obtained such as: • SSR(X1,X2,X3) = SSR(X1) + SSR(X2|X1) + SSR(X3|X1,X2) • SSR(X1,X2,X3) = SSR(X2) + SSR(X3|X2) + SSR(X1|X2,X3) • SSR(X1,X2,X3) = SSR(X1) + SSR(X2,X3|X1)
Decomposition of SSR (5) ANOVA tables containing decomposition of SSR is as follows:
Usage of Extra Sum of Squares • Test whether a single βk = 0 Suppose we have a regression with 3 variables indpendent. A full model is given by: and SSE(F) = SSE(X1,X2,X3) with df=n-4 To test whether β2 = 0,we have an alternative model: As a reduced model and SSE(R) = SSE(X1, X3) with df=n-3
Usage of Extra Sum of Squares (2) The general linear test statistics is given by
Usage of Extra Sum of Squares (3) • Test whether several βk = 0 Suppose we have a regression with 3 variables indpendent. A full model is given by: and SSE(F) = SSE(X1,X2,X3) with df=n-4 To test whether β2 = β3 = 0,we have an alternative model: As a reduced model and SSE(R) = SSE(X1) with df=n-2
Usage of Extra Sum of Squares (4) The general linear test statistics is
Matrix Formulation of General Linear Test • Full model with p-1 input variables is given by: • The least squares estimator : bF =(X’X)-1X’Y • and the error sumof squares is given by: SSE(F) = (Y’Y – b’F X’Y) • Reduced model with a single or several βk = 0 is given by: where Cβ = h, C is sxp matrix of rank s and h is a specified sx1 vector.
Matrix Formulation of General Linear Test (2) • Example 1: To test whether β2 = 0 in a regression model contains 2 indep variables, then C = [0 0 1] and h = [0], and we have: • Example 2: To test whether β1 = β2 = 0 in a regression model contains 2 indep variables, then
Matrix Formulation of General Linear Test (3) • and we have: • And the least squares estimator : bR =bF - (X’X)-1C’(C (X’X)-1C’)-1(CbF-h) and the error sumof squares is given by: SSE(R) = (Y’Y – XbR)’(Y’Y – XbR) Which has associated with it dfR = n – (p-s)
Matrix Formulation of General Linear Test (3) • and we have: • And the least squares estimator : bR =bF - (X’X)-1C’(C (X’X)-1C’)-1(CbF-h) and the error sumof squares is given by: SSE(R) = (Y’Y – XbR)’(Y’Y – XbR) Which has associated with it dfR = n – (p-s)
Matrix Formulation of General Linear Test (4) • Test Statistics where
Example in NWK p 271-273 A study of the realtion of amount of body fat (Y) to several possible explanatory, independent variables, based on a sample of 20 healthy females age 25-34 years old. The possible independent variables are triceps skinfold thickness (X1), thigh circumference (X2) and midarm circumference(X3). Underwater weighing is the alternative
Output F Suppose we modelize based on all X variables, We have anova table as follows: And the least square estimators for regr coeff:
Interpretation • The P value for F(3, 16) is <.0001 • But the P values for the individual regression coefficients are 0.1699, 0.2849, and 0.1896 • None of these are near our standard of 0.05 • What is the explanation?
Look at Extra SS VarType I SS Type II SS skinfold 352.26 12.70 thigh 33.16 7.52 midarm 11.54 11.54 Total 495.38 • Fact: the Type I and Type II SS are very different • If we reorder the variables in the model statement we will get • Different Type I SS • The same Type II SS
Run additional models Rerun with skinfold as the explanatory variable: And the least square estimators for regr coeff:
General linear test on thigh and midarm Rerun with skinfold as the explanatory variable:
Coefficients of partial Determinations • A coeff of multiple determination R2, measures the proportionate reduction in the variation of Y achieved by the introduction of the entire set of X variables considered in the model. • A coeff of partial determinationr2, measures the marginal contribution of one X variable, when all others are already in cluded in the model.
Coefficients of partial Determinations(2) • Suppose we have a regression with 2 variables indpendent. A full model is given by: The coeff of partial determination between Y and X1, given that X2 is in the model is measured by
Coefficients of partial Determinations(3) The coeff of partial determination between Y and X2, given that X1 is in the model is measured by
Coefficients of partial Determinations(4) General Case The coeff of partial determination to 3 or more indep. variables in the model is immediate. For instance:
Coefficients of partial correlations Coeff of partial correlation is the square root of a coeff of partial determination, it sign is the same as the corresponding regression coeff. The coeff of partial determination can be expressed in terms of simple or other partial correlations. For example:
Standardized Regression Model • Can help reduce round off errors in calculations • Puts regression coefficients in common units • Units for the usual coefficients are units for Y divided by units for X
Standardized Regression Model (2) • Standardized can be obtained from the usual ones by multiplying by the ratio of the standard deviation of X to the standard deviation of Y • Interpretation is that a one sd increase in X corresponds to a ‘standardized beta’ increase in Y
Standardized Regression Model (3) • Y = … + βX + … • = … + β(sX/sY)(sY/sX)X + … • = … + (β(sX/sY)) ((sY/sX)X) + … • = … + (β(sX/sY)) (sY) (X/sX) + …
Standardized Regression Model (4) • Standardize Y and all X’s (subtract mean and divide by standard deviation • Then divide by n-1 • The regression coefficients for variables transformed in this way are the standardized regression coefficients
Last Slide • Reading NKMW 8.1 to 8.3 • Exercise NKMW page 308-312 no 8.3, 8.12 • Homework NKMW page 308-312 no 8.25, 8.31