1 / 27

Lecture 1b: Inferences in Regression and Correlation Analysis

Lecture 1b: Inferences in Regression and Correlation Analysis. Normal error linear model. Formal statement Y i is i th response value β 0 β 1 model parameters, regression parameters (intercept, slope) X i is i th predictor value

ianthe
Download Presentation

Lecture 1b: Inferences in Regression and Correlation Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 1b:Inferences in Regressionand Correlation Analysis 732G21/732A35/732G28

  2. Normal error linear model Formal statement • Yi is i th response value • β0 β1 model parameters, regression parameters (intercept, slope) • Xi is i th predictor value • is i.i.d. normally distributed random vars with expectation zero and variance σ2 732G21/732A35/732G28

  3. Overview Inferenceabout regression coefficients and response: • Interval estimates and test concerning coefficients • Confidence interval for Y • Prediction interval for Y • ANOVA-table 732G21/732A35/732G28

  4. Inferences about slope • After fitting the data, we may obtain a regr. line • Is 0.00005 significant or just because of random variation? (hence, no linear dependence between Y and X) • How to do? • Use Hypothesis testing (later) • Derive confindence interval for β0 . If ”0” does not fall within this interval, there is dependence 732G21/732A35/732G28

  5. Inferences about slope • Estimated slope b1 is a random variable (look at formula) Properties of b1 • Normally distributed (show) • E(b1)= β1 • Variance Further: Test statisticsis distributed as t(n-2) 732G21/732A35/732G28

  6. T statistics • See table B.2 (p. 1317) • Example one-sided interval t(95%), 15 observations t13=1.771 732G21/732A35/732G28

  7. Inferences about slope • Confidence interval for β1 (show…) • If variance in the data is unknown, ExampleComputeconfidence interval for slope, Salary dataset 732G21/732A35/732G28

  8. From previouslecture 732G21/732A35/732G28

  9. About hypothesis testing • Often, we have sample and we test at some confidence level α How to do? • Step 1: Find and compute appropriate test function T=T(sample,λ0) • Step 2: Plot test function’s distrubution and mark a critical area dependent on α • If T is in the critical area, reject H0 otherwise do not reject H0 (accept H1) 732G21/732A35/732G28

  10. Inferences about slope • Test • Step 1: compute • Step 2: Plot the distribution , mark the points and the critical area. • Step 3: define where t* is and reject H0 if it is in the critical area Example Test the hypothesis for Salary dataset: • Manually, compute also P-values • By Minitab 732G21/732A35/732G28

  11. Inferences about intercept • Sometimes, weneedto know ” β0=0?” Do confidence intervals and hypothesis testing in the same way using folmulas below! Properties of b0 • Normally distributed (show) • E(b0)= β0 • Variance (show..) Further: Test statisticsis distributed as t(n-2) 732G21/732A35/732G28

  12. Inference about model parameters • If distribution not normal (if slightly, OK, otherwise asymptotic) • Spacing affects variance (larger spacing –smaller variance) Example Test β0=0 for Salary data 732G21/732A35/732G28

  13. Expected response • Estimate at X=Xh (Xh – any): Properties of E(Yh) • Normally distributed (show) • Variance Further: Test statisticsis distributed as t(n-2) Confidence interval 732G21/732A35/732G28

  14. Prediction of new observation • Make a plot… CONFIDENCE INTERVAL We estimate the position of the mean in the population with X = Xh POINT ESTIMATE PREDICTION INTERVAL We estimate the position of the individual observation in the population with X = Xh 732G21/732A35/732G28

  15. Prediction of new observation • When parameters are unknown, the mean E(Yh) may have more than one possible location • New observation = mean + random error -> prediction interval should be wider 732G21/732A35/732G28

  16. Prediction of new observation Further: Test statisticsis distributed as t(n-2) Prediction interval • How to estimate s(pred) ? New observ. is any within b0+b1Xh+ε. Hence • Standard error (show) 732G21/732A35/732G28

  17. Prediction of new observation Example • Calculateconfidence and prediction intervals for 35 yearsold person • Compare with output in Minitab 732G21/732A35/732G28

  18. Analysis of Variance approach • Total sum of squares • Error sum of squares • Regression sum of squares 732G21/732A35/732G28

  19. Degrees of freedom • SSTO has n-1 (sum up to zero) • SSE has n-2 ( 2 model parameters) • SSR has 1 (fitted values lie on regression line= 2 degrees-sum up to zero 1 degree) n-1 = n-2 + 1 SSTO =SSE + SSR Important : MSxx= SSxx/degrees_of_freedom 732G21/732A35/732G28

  20. Analysis of Variance table • ANOVA table 732G21/732A35/732G28

  21. Analysis of Variance approach Expected mean squares • E(MSE) does not depend on the slope, even when zero • E(MSR) =E(MSE) when slope is zero • -> IF MSR much more than MSE, slope is not zero, if approximately same, can be zero 732G21/732A35/732G28

  22. Hypothesis testing using ANOVA • Test statistics F* = MSR/MSE , use F(1,n-2) (see p. 1320) Decision rules: • If F* > F(1-α;1, n-2) conclude Ha • If F* ≤ F(1-α;1, n-2) conclude H0 Note: F test and t test about β1 are equivalent 732G21/732A35/732G28

  23. Hypothesis testing using ANOVA • General approach • Full model: (linear) • Reduced model: (constant) 732G21/732A35/732G28

  24. Hypothesis testing using ANOVA • It is known (why?..) SSE(F)≤SSE(R). Large difference -different models, small difference – can be same • Test statistics • For univariate linear model, equivalent to F* = MSR/MSE • F* belongs to F(dfR-dfF,dfF) distribution (plot critical area..) • Test rule: F*> F(1-α; dfR-dfF,dfF)  reject H0 732G21/732A35/732G28

  25. Hypothesis testing using ANOVA Example For Salary dataset • Compose ANOVA table and compare with MINITAB • Perform F-test and compare with MINITAB 732G21/732A35/732G28

  26. Measures of linear association • Coefficient of determination: • Coefficient of correlation: Limitations: • High R does not mean a good fit • Low R does not mean than X and Y are not related Example: For Salary dataset, compute R2 and compare with MINITAB 732G21/732A35/732G28

  27. Reading • Chapter 2 up to page 78 732G21/732A35/732G28

More Related