330 likes | 488 Views
Topic 6: Estimation and Prediction of Y h. Outline. Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band for the entire regression line. Estimation of E(Y h ).
E N D
Outline • Estimation and inference of E(Yh) • Prediction of a new observation • Construction of a confidence band for the entire regression line
Estimation of E(Yh) • E(Yh) = μh =β0+ β1Xh, the mean value of Y for the subpopulation with X=Xh • We will estimate E(Yh) by • KNNL use for this estimate, see equation (2.28) on pp 52
Theory for Estimation of E(Yh) • is Normal with mean μh and variance • The Normality is a consequence of the fact that b0 + b1Xh is a linear combination of Yi’s • See KNNL pp 52-54 for details
Application of the Theory • We estimate σ2( ) by • It then follows that • Details for confidence intervals and significance tests are consequences
95% Confidence Interval for E(Yh) • ± tcs( ) where tc = t(.975, n-2) • NOTE: significance tests can be constructed but they are rarely used in practice
Toluca Company Example (pg 19) • Manufactures refrigeration equipment • One replacement part manufactured in lots of varying sizes • Company wants to determine the optimum lot size • To do this, company needs to first describe the relationship between work hours and lot size
SAS CODE ***Generating the data set***; data toluca; infile ‘../data/CH01TA01.txt'; input lotsize hours; data other; size=65; output; size=100; output; data toluca1; set toluca other; proc print data=toluca1; run;
SAS CODE ***Generating the confidence intervals for all values of X in the data set***; proc reg data=toluca1; model hours=size/clm; id lotsize; run; clm option generates confidence intervals for the mean
Notes • Standard error affected by how far Xh is from (see Figure 2.6) • Recall teeter-totter idea…a change in the slope has bigger impact on Y as you move away from
Prediction of Yh(new) • Want to predict value for a new observation at X=Xh • Model: Yh(new) = β0 + β1Xh + • Since E(e)=0 same value as for E(Yh) • Prediction interval, however, relies heavily on assumption that e are Normally distributed Note!!
Prediction of Yh(new) • Var(Yh(new))=Var( )+Var(ξ ) • Then follows that
Notes • Procedure can be modified for the mean of m observations at X=Xh (see 2.39a and 239b on page 60) • Standard error affected by how far Xh is from (see Figure 2.6)
SAS CODE ***Generating the prediction intervals for all values of X in data set***; proc reg data=toluca1; model hours=lotsize/cli; id lotsize; run; cli option generates prediction interval for a new observation
These are wrong…same as before. Does not include variability about regression line
Notes • The standard error(Std Error Mean Predict)given in this output is the standard error of not s(pred) • The prediction interval is correct and wider than the previous confidence interval
Notes • To get correct standard error need to add the variance about the regression line
Confidence band for regression line • ± Ws( ) where W2=2F(1-α; 2, n-2) • This gives combined “confidence intervals” for all Xh • Boundary values of confidence bands define a hyperbola • Will be wider at Xh than single CI
Confidence band for regression line • Theory comes from the joint confidence region for (β0, β1 ) which is an ellipse (Stat 524) • We can find an alpha for tc that gives the same results • We find W2 and then find the alpha for tc that will give W = tc
SAS CODE data a1; n=25; alpha=.10; dfn=2; dfd=n-2; tsingle=tinv(1-alpha/2,dfd); w2=2*finv(1-alpha,dfn,dfd); w=sqrt(w2); alphat=2*(1-probt(w,dfd)); t_c=tinv(1-alphat/2,dfd); output; proc print data=a1; run;
SAS OUTPUT Used for 90% confidence band Used for single 90% CI
SAS CODE symbol1 v=circle i=rlclm97; proc gplot data=toluca; plot hours*lotsize; run;
Estimation of E(Yh) and Prediction of Yh = b0 + b1Xh
SAS CODE symbol1 v=circle i=rlclm95; proc gplot data=toluca; plot hours*lotsize; symbol1 v=circle i=rlcli95; proc gplot data=toluca; plot hours*lotsize; run; Confidence intervals Prediction intervals
Background Reading • Program topic6.sas has the code for the various plots and calculations; • Sections 2.7, 2.8, and 2.9