470 likes | 683 Views
Analisa Regresi Week 7. The Multiple Linear Regression Model Key ideas from case study Model Description and assumption The general linear model and the least square procedure I nference for multiple regression. Key ideas from case study (2).
E N D
AnalisaRegresiWeek 7 The Multiple Linear Regression Model • Key ideas from case study • Model Description and assumption • The general linear model and the least square procedure • Inference for multiple regression
Key ideas from case study (2) • The relationship between a response variable and an explanatory variable depends on what other explanatory variables are in the model • A variable can be a significant (P<.05) predictor alone and not significant (P>0.5) when other Xs are in the model
Key ideas from case study (1) • First, look at graphical and numerical summaries for one variable at a time • Then, look at relationships between pairs of variables with graphical and numerical summaries. • Use plots and correlations
Key ideas from case study (3) • Regression coefficients, standard errors and the results of significance tests depend on what other explanatory variables are in the model
Key ideas from case study (4) • Significance tests (P values) do not tell the whole story • Squared multiple correlations give the proportion of variation in the response variable explained by the explanatory variables) can give a different view • We often express R2 as a percent
Key ideas from case study (5) • You can fully understand the theory in terms of Y= Xβ+ ε • To effectively use this methodology in practice you need to understand how the data were collected, the nature of the variables, and how they relate to each other
Model Description and Assumptions Consider an experiment in which data are generated of the type
Model Description and Assumptions (2) If the experimenter is willing to assume that in the region of the x’s defined by the data, yi is related approximately to the regressor variables, then the model formulation: (1) where • Yiis the response variablefor the ith case • Xi1, Xi2, … , Xikare kexplanatory variables for cases i = 1 to n and n ≥ k + 1 • εiis a model error • β0 is the intercept • β1, β2, … , βk are the regression coefficients for the explanatory variables
Model Description and Assumptions (3) What is a Linear Model? A linear model is defined as a model that is linear in the parameter, i.e., linear in the coefficients, the β’s in the eq (1) For example : • A model quadratic in X • A linear model with interaction
Model Description and Assumptions (4) What is the meaning of Regression Coefficients? • βo is the Y intercept of the regression plane. If the scope of the model includes Xi1 = 0, … , Xik= 0, it gives the mean response at Xi1 = 0, … , Xik= 0. • The parameter β1indicates the change in the mean response E(Y) per unit change in X1 when other X’s held constant. • The β’sare often caled partial regression coefficients because they reflect the partial effect on one independent variable when the other variables are included in the model and are held constant.
Model Description and Assumptions (5) Assumptions • εiare independent normally distributed random errors with mean 0 and variance σ2 • Xjiare not random and are measured with negligible error
The GLM and the Least Square Procedure In matrix term: Where: Y is a vector of responses βis a vector of parameters X is a matrix of constants ε is a vector of model errors which ε ~ N(0,σ2I) Consequently, Y ~ N(Xβ,σ2I)
The GLM and the Least Square Procedure (2) Least Square procedure: minimize e’e = (Y-Xb)’(Y-Xb) The least square normal equations: (X’X)b = X’y Assuming X is of full column rank, b =(X’X)-1X’y
The GLM and the Least Square Procedure (3) The fitted predicted values: And the residual terms by Or The variance-covariance matrix of the residuals is Which is estimated by s2{e}= MSE(I-H)
ANOVA Table • To organize arithmetic • Sources of variation are • Model (SAS) or Regression (NKNW) • Error (SAS, NKNW) or Residual • Total • SS and df add • SSM + SSE =SST • dfM + dfE = dfT
SS (2) The sum squares for ANOVA in matrix terms are :
ANOVA Table Source SS df MS Model SSM dfM MSM Error SSE dfE MSE Total SST dfT (MST) F = MSM/MSE
ANOVA F test • H0: β1 = β2 = … βp-1 = 0 • Ha: βk neq 0, for at least one k=1, … , p-1 • Another form of the null hypothesis is • H0: β1 = 0, and β2 = 0, … , and βp-1 = 0 • Under H0, F ~ F(p-1,n-p) • Reject H0 if F is large, use P value
Example NKNW p 249 • The Zartan company sells a special skin cream through fashion stores exclusively in 15 districts • Y is sales • X1 is target population • X2 is per capita discretionary income • n = 15 districts
Hypothesis Tested by F • H0: β1 = β2 = … βp-1 = 0 • F = MSM/MSE • Reject H0 if the P value is leq .05
ANOVA Table What do we conclude?
R2 • The squared multiple regression correlation (R2) gives the proportion of variation in the response variable explained by the explanatory variables included in the model • It is usually expressed as a percent • It is sometimes called the coefficient of multiple determination (NKNW p 230)
R2 (2) • R2 = SSM/SST, the proportion of variation explained • R2 = 1 – (SSE/SST), 1 – the proportion of variation not explained • H0: β1 = β2 = … βp-1 = 0 is equivalent to H0: the population R2 is zero • F = [ (R2)/(p-1) ] / [ (1- R2)/(n-p) ]
What and Why • At this point we have examined the distribution of the explanatory variables (and the response variable if that is appropriate) and we have taken remedial measures where needed • We have looked at plots and numerical summaries
What and Why (2) • The P-value for the F significance test tells us one of the following: • there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model (P gt .05) • one or more of the explanatory variables in our model is potentially useful for predicting the response variable in a linear model (P leq .05)
R2 output R-Sq = 0.999 Adj R-Sq = 0.999 Coeff Var = 6.0
Inference for individual regression coefficients • b ~ N(β, (s(bi, bk))) • s(bi, bi) = s2(bi) • CI: bi ± t*s(bi) • Significance test for H0i: βi, = 0 uses the test statistic t =bi/s(bi), df=dfE=n-p, and the P-value computed from the t(n-p) distribution
Coef. RegrEst Par St Var Est Err t P Int 3.453 2.431 1.420 0.181 Pop 0.496 0.006 81.924 <.0001 income 0.009 0.001 9.502 <.0001
Estimation by Doolittle General Format: X’X | X’Y | I By using row transformation we’ll have I | b | (X’X)-1 Example : see worksheet 1
Estimation of E(Yh) • Xh is now a vector • (1, Xh1, Xh2, … , Xh1)’ • We want an point estimate and a confidence interval for the subpopulation mean corresponding to Xh
Estimation of E(Yh) (2) • The mean response to be estimated is • The estimated mean response corresponding to Xh, is • This estimated is unbiased • and its variance is
Estimation of E(Yh) (3) • The estimated variance s2(Ŷh) is given by • The 1 – α CI for E(Yh) are • Example (in the class)
F Test for Lack of Fit • It is used when a data set requires repeat observation • Repeat observation in multiple regression are replicate observation on Y coresponding to level of each of the X variables that are constant from trial to trial. • Whit two independent variables repeat observations require that X1 and X2 each remain at given levels from trial to trial.
F Test for Lack of Fit (2) • SSE is decomposed into pure error and lack of fit component • SSE = SSPE + SSLF • The pure error sum of square SSPE is obtained by first calculating for each replicate group the sum of squared deviations of the Y observation around the group mean, where a replicate group has the same values for each of the X variables.
F Test for Lack of Fit (3) • If the linear regression function is appropriate, then the means Ȳjwill be near the fitted values Ŷij calculated from the esyimated linear regression function and SSLF will be small. See the illistration on NWK page 138. • Df(SSPE) = n – c and df(SSLF) = (n-p)-(n-c)=c-p where c is is the number of replicate group
F Test for Lack of Fit (4) • The hypotheses statements are: • The approriate test statistic is • And the appropriate decision rule is conclude Ho if
Prediction of new observation Yh(new) • Xh is now a vector • (1, Xh1, Xh2, … , Xh1)’ • We want a prediction for Yh with an interval that expresses the uncertainty in our prediction
Prediction of new observation Yh(new)(2) • The prediction of new observation Yh(new)corresponding to Xh has 1 – α CI of • Where • Example (in the class)
Prediction of new observation Yh(new)(3) • When m new observation at Xh and their meanȲh(new)is to be predicted, the 1-corresponding to has 1 – α CI of • Where • Example (in the class)
Last slide • Reading NKMW 7.1 to 7.8 • Exercise NKMW page 264 no 7.8-7.11 • Homework NKMW page 264-267 n0 7.12-7.19