1 / 72

Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression. Ray-Bing Chen Institute of Statistics National University of Kaohsiung. 2.1 Simple Linear Regression Model. y =  0 +  1 x +  x: regressor variable y: response variable  0 : the intercept, unknown  1 : the slope, unknown

Download Presentation

Chapter 2 Simple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung

  2. 2.1 Simple Linear Regression Model • y = 0 + 1 x +  • x: regressor variable • y: response variable • 0: the intercept, unknown • 1: the slope, unknown • : error with E() = 0 and Var() = 2 (unknown) • The errors are uncorrelated.

  3. Given x, E(y|x) = E(0 + 1 x + ) = 0 + 1 x Var(y|x) = Var(0 + 1 x + ) = 2 • Responses are also uncorrelated. • Regression coefficients: 0, 1 • 1: the change of E(y|x) by a unit change in x • 0: E(y|x=0)

  4. 2.2 Least-squares Estimation of the Parameters 2.2.1 Estimation of 0 and 1 • n pairs: (yi, xi), i = 1, …, n • Method of least squares: Minimize

  5. Least-squares normal equations:

  6. The least-squares estimator:

  7. The fitted simple regression model: • A point estimate of the mean of y for a particular x • Residual: • An important role in investigating the adequacy of the fitted regression model and in detecting departures from the underlying assumption!

  8. Example 2.1: The Rocket Propellant Data • Shear strength is related to the age in weeks of the batch of sustainer propellant. • 20 observations • From scatter diagram, there is a strong relationship between shear strength (y) and propellant age (x). • Assumption y = 0 + 1 x + 

  9. The least-square fit:

  10. How well does this equation fit the data? • Is the model likely to be useful as a predictor? • Are any of the basic assumption violated and if so how serious is this?

  11. 2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model • are linear combinations of yi • are unbiased estimators.

  12. The Gauss-Markov Theorem: are the best linear unbiased estimators (BLUE).

  13. Some useful properties: • The sum of the residuals in any regression model that contains an intercept 0 is always 0, i.e. • Regression line always passes through the centroid point of data,

  14. 2.2.3 Estimator of 2 • Residual sum of squares:

  15. Since , the unbiased estimator of 2 is • MSE is called the residual mean square. • This estimate is model-dependent. • Example 2.2

  16. 2.2.4 An Alternate Form of the Model • The new regression model: • Normal equations: • The least-squares estimators:

  17. Some advantages: • The normal equations are easier to solve • are uncorrelated.

  18. 2.3 Hypothesis Testing on the Slope and Intercept • Assume εi are normally distributed • yi ~ N(0 + 1 xi , 2 ) 2.3.1 Use of t-Tests • Test on slope: • H0: 1 = 10 v.s. H1: 110

  19. If 2 is known, under null hypothesis, • (n-2) MSE/2 follows a 2n-2 • If 2 is unknown, • Reject H0 if |t0| > t/2, n-2

  20. Test on intercept: • H0: 0 = 00 v.s. H1: 000 • If 2 is unknown • Reject H0 if |t0| > t/2, n-2

  21. 2.3.2 Testing Significance of Regression • H0: 1 = 0 v.s. H1: 10 • Accept H0: there is no linear relationship between x and y.

  22. Reject H0: x is of value in explaining the variability in y. • Reject H0 if |t0| > t/2, n-2

  23. Example 2.3:The Rocket Propellant Data • Test significance of regression • MSE = 9244.59 • the test statistic is • t0.0025,18 = 2.101 • Reject H0

  24. 2.3.3 The Analysis of Variance (ANOVA) • Use an analysis of variance approach to test significance of regression

  25. SST: the corrected sum of squares of the observations. It measures the total variability in the observations. • SSRes: the residual or error sum of squares • The residual variation left unexplained by the regression line. • SSR: the regression or model sum of squares • The amount of variability in the observations accounted for by the regression line • SST = SSR + SSRes

  26. The degree-of-freedom: • dfT = n-1 • dfR = 1 • dfRes = n-2 • dfT = dfR + dfRes • Test significance regression by ANOVA • SSRes = (n-2) MSRes ~ n-2 • SSR = MSR ~ 1 • SSR and SSRes are independent

  27. E(MSRes) = 2 • E(MSR) = 2 + 12 Sxx • Reject H0 if F0 > F/2,1, n-2 • If 1 0, F0 follows a noncentral F with 1 and n-2 degree of freedom and a noncentrality parameter

  28. Example 2.4: The Rocket Propellant Data

  29. More About the t Test • The square of a t random variable with f degree of freedom is a F random variable with 1 and f degree of freedom.

  30. 2.4 Interval Estimation in Simple Linear Regression 2.4.1 Confidence Intervals on 0, 1 and 2 • Assume that εi are normally and independently distributed

  31. 100(1-)% confidence intervals on 0, 1are given: • Interpretation of C.I. • Confidence interval for 2:

  32. Example 2.5The Rocket Propellant Data

  33. 2.4.2 Interval Estimation of the Mean Response • Let x0 be the level of the regressor variable for which we wish to estimate the mean response. • x0 is in the range of the original data on x. • An unbiased estimator of E(y| x0) is

  34. follows a normal distribution.

  35. A 100(1-)% confidence interval on the mean response at x0:

  36. Example 2.6 The Rocket Propellant Data

  37. The interval width is a minimum for and widens as increases. • Extrapolation

  38. 2.5 Prediction of New Observations • is the point estimate of the new value of the response • follows a normal distribution with mean 0 and variance

  39. The 100(1-)% confidence interval on a future observation at x0 (a prediction interval for the future observation y0)

  40. Example 2.7:

  41. The 100(1-)% confidence interval on

  42. 2.6 Coefficient of Determination • The coefficient of determination: • The proportion of variation explained by the regressor x • 0  R2 1

  43. In Example 2.1, R2 = 0.9018. It means that 90.18% of the variability in strength is accounted for by the regression model. • R2 can be increased by adding terms to the model. • For a simple regression model, • E(R2) increases (decreases) as Sxx increases (decreases)

  44. R2 does not measure the magnitude of the slope of the regression line. A large value of R2 imply a steep slope. • R2 does not measure the appropriateness of the linear model.

More Related