340 likes | 608 Views
Lecture 1b: Inferences in Regression and Correlation Analysis. Normal error linear model. Formal statement Y i is i th response value β 0 β 1 model parameters, regression parameters (intercept, slope) X i is i th predictor value
E N D
Lecture 1b:Inferences in Regressionand Correlation Analysis 732G21/732A35/732G28
Normal error linear model Formal statement • Yi is i th response value • β0 β1 model parameters, regression parameters (intercept, slope) • Xi is i th predictor value • is i.i.d. normally distributed random vars with expectation zero and variance σ2 732G21/732A35/732G28
Overview Inferenceabout regression coefficients and response: • Interval estimates and test concerning coefficients • Confidence interval for Y • Prediction interval for Y • ANOVA-table 732G21/732A35/732G28
Inferences about slope • After fitting the data, we may obtain a regr. line • Is 0.00005 significant or just because of random variation? (hence, no linear dependence between Y and X) • How to do? • Use Hypothesis testing (later) • Derive confindence interval for β0 . If ”0” does not fall within this interval, there is dependence 732G21/732A35/732G28
Inferences about slope • Estimated slope b1 is a random variable (look at formula) Properties of b1 • Normally distributed (show) • E(b1)= β1 • Variance Further: Test statisticsis distributed as t(n-2) 732G21/732A35/732G28
T statistics • See table B.2 (p. 1317) • Example one-sided interval t(95%), 15 observations t13=1.771 732G21/732A35/732G28
Inferences about slope • Confidence interval for β1 (show…) • If variance in the data is unknown, ExampleComputeconfidence interval for slope, Salary dataset 732G21/732A35/732G28
From previouslecture 732G21/732A35/732G28
About hypothesis testing • Often, we have sample and we test at some confidence level α How to do? • Step 1: Find and compute appropriate test function T=T(sample,λ0) • Step 2: Plot test function’s distrubution and mark a critical area dependent on α • If T is in the critical area, reject H0 otherwise do not reject H0 (accept H1) 732G21/732A35/732G28
Inferences about slope • Test • Step 1: compute • Step 2: Plot the distribution , mark the points and the critical area. • Step 3: define where t* is and reject H0 if it is in the critical area Example Test the hypothesis for Salary dataset: • Manually, compute also P-values • By Minitab 732G21/732A35/732G28
Inferences about intercept • Sometimes, weneedto know ” β0=0?” Do confidence intervals and hypothesis testing in the same way using folmulas below! Properties of b0 • Normally distributed (show) • E(b0)= β0 • Variance (show..) Further: Test statisticsis distributed as t(n-2) 732G21/732A35/732G28
Inference about model parameters • If distribution not normal (if slightly, OK, otherwise asymptotic) • Spacing affects variance (larger spacing –smaller variance) Example Test β0=0 for Salary data 732G21/732A35/732G28
Expected response • Estimate at X=Xh (Xh – any): Properties of E(Yh) • Normally distributed (show) • Variance Further: Test statisticsis distributed as t(n-2) Confidence interval 732G21/732A35/732G28
Prediction of new observation • Make a plot… CONFIDENCE INTERVAL We estimate the position of the mean in the population with X = Xh POINT ESTIMATE PREDICTION INTERVAL We estimate the position of the individual observation in the population with X = Xh 732G21/732A35/732G28
Prediction of new observation • When parameters are unknown, the mean E(Yh) may have more than one possible location • New observation = mean + random error -> prediction interval should be wider 732G21/732A35/732G28
Prediction of new observation Further: Test statisticsis distributed as t(n-2) Prediction interval • How to estimate s(pred) ? New observ. is any within b0+b1Xh+ε. Hence • Standard error (show) 732G21/732A35/732G28
Prediction of new observation Example • Calculateconfidence and prediction intervals for 35 yearsold person • Compare with output in Minitab 732G21/732A35/732G28
Analysis of Variance approach • Total sum of squares • Error sum of squares • Regression sum of squares 732G21/732A35/732G28
Degrees of freedom • SSTO has n-1 (sum up to zero) • SSE has n-2 ( 2 model parameters) • SSR has 1 (fitted values lie on regression line= 2 degrees-sum up to zero 1 degree) n-1 = n-2 + 1 SSTO =SSE + SSR Important : MSxx= SSxx/degrees_of_freedom 732G21/732A35/732G28
Analysis of Variance table • ANOVA table 732G21/732A35/732G28
Analysis of Variance approach Expected mean squares • E(MSE) does not depend on the slope, even when zero • E(MSR) =E(MSE) when slope is zero • -> IF MSR much more than MSE, slope is not zero, if approximately same, can be zero 732G21/732A35/732G28
Hypothesis testing using ANOVA • Test statistics F* = MSR/MSE , use F(1,n-2) (see p. 1320) Decision rules: • If F* > F(1-α;1, n-2) conclude Ha • If F* ≤ F(1-α;1, n-2) conclude H0 Note: F test and t test about β1 are equivalent 732G21/732A35/732G28
Hypothesis testing using ANOVA • General approach • Full model: (linear) • Reduced model: (constant) 732G21/732A35/732G28
Hypothesis testing using ANOVA • It is known (why?..) SSE(F)≤SSE(R). Large difference -different models, small difference – can be same • Test statistics • For univariate linear model, equivalent to F* = MSR/MSE • F* belongs to F(dfR-dfF,dfF) distribution (plot critical area..) • Test rule: F*> F(1-α; dfR-dfF,dfF) reject H0 732G21/732A35/732G28
Hypothesis testing using ANOVA Example For Salary dataset • Compose ANOVA table and compare with MINITAB • Perform F-test and compare with MINITAB 732G21/732A35/732G28
Measures of linear association • Coefficient of determination: • Coefficient of correlation: Limitations: • High R does not mean a good fit • Low R does not mean than X and Y are not related Example: For Salary dataset, compute R2 and compare with MINITAB 732G21/732A35/732G28
Reading • Chapter 2 up to page 78 732G21/732A35/732G28