1 / 34

Statistical Inference in Regression Analysis: Point & Interval Estimation, Hypothesis Testing

Learn about point and interval estimation in simple and multiple linear regression, hypothesis testing, and model fitting techniques. Understand classical assumptions, maximum likelihood estimators, and covariance matrices in regression analysis.

emmiea
Download Presentation

Statistical Inference in Regression Analysis: Point & Interval Estimation, Hypothesis Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 26 Statistical Inference in Simple Linear Regression Multiple Linear Regression 1

  2. Point Estimation 1 Suppose Armand’s managers want a point estimate of the mean quarterly sales for all restaurants located near college campuses with 10,000 students ( ), i.e., a point estimate for A point estimate is or 110,000 Yuan.

  3. Point Estimation 2 Suppose Armand’s managers want a point estimate of the quarterly sales for an individual restaurant located near a college with 10,000 students ( ), i.e., a point estimate for . A point estimate is or 110,000 Yuan.

  4. Interval Estimation 1 Suppose Armand’s managers want an interval estimate of the mean quarterly sales for all restaurants located near college campuses with 10,000 students ( ), i.e., an interval estimate for . The interval estimate is called a confidence interval.

  5. Interval Estimation 2 Suppose Armand’s managers want an interval estimate of the quarterly sales for an individual restaurant located near a college with 10,000 students ( ), i.e., an interval estimate for . The interval estimate is called a prediction interval.

  6. Question • Which interval is wider? Confidence interval or prediction interval?

  7. Confidence Interval

  8. Prediction Interval • Suppose that we want to predict a new Y value, which is independent of the observed data (x1,Y1),…,(xn,Yn), when we knew the corresponding value of . Point prediction:

  9. Prediction Interval

  10. Prediction Results from Minitab 新观测值的预测值 新观 拟合值 测值 拟合值 标准误 95% 置信区间 95% 预测区间 1 110.00 4.95 (98.58, 121.42) (76.13, 143.87) 新观测值的自变量值 新观 Student 测值 Population 1 10.0 Confidence Interval Prediction Interval

  11. Confidence Interval and Prediction Interval

  12. 妇女工资和一些社会经济变量的关系 Source: T.A. Mroz (1987), The Sensitivity of an Empirical Model of Married Womens Hours of Work to Economic and Statistical Assumptions, Econometrica 55, 765-799. • lwage: log of wage • exper: actual labor mkt exper • expersq: exper2 • educ: years of schooling • age: woman's age in yrs • kidslt6: number of kids < 6 years • kidsge6: number of kids 6-18

  13. 多元线性回归

  14. Multiple Linear Regression • E.g. • E.g. • E.g. • The term linear refers to the fact that the expectation of is a linear function of the unknown parameters

  15. Classical Assumptions • Linearity: • Normality: each variable Yi has a normal distribution. • Independence: the variables Y1,…,Yn are independent. • Homoscedasticity: the variables Y1,…,Yn have the same variance .

  16. Maximum Likelihood Estimators • The likelihood function of and : • The values of that maximize the likelihood function will be the values that minimize So the M.L.E. of are the least square estimators. • Define The M.L.E. for is

  17. Explicit Form of the Estimators • The design matrix • Also define

  18. The set of k+1 normal equations (by setting j=0,…,k) can be written as: So the least square estimators (the M.L.E.s) are: • We can see that the estimators will be linear combinations of Y1,…,Yn, so they follow multivariate normal distribution.

  19. Mean Vector and Covariance Matrix

  20. Theorem. Suppose that Y is an n-dimensional random vector, for which the mean vector E(Y) and the covariance matrix Cov(Y) exist. Suppose also that A is a p*n matrix whose elements are constants, and that W is a p-dimensional random vector defined by W=AY. Then E(W)=AE(Y) and Cov(W)=ACov(Y)A’.

  21. Since , we have

  22. The vector has a multivariate normal distribution with mean vector and covariance matrix . • For j=0,…,k, , marginally, has a normal distribution with mean and variance . • For , .

  23. The Joint Distribution of the Estimators

  24. Testing Hypotheses where is an unbiased estimator of . The level a0 test rejects H0 if

  25. R analysis # put data in the current workplace , and read data mroz <- read.csv ('mroz.csv ') # an overview head ( mroz ) # some exploration lm1 <- lm( lwage ~educ , data = mroz ) summary (lm1) lm2 <- lm( lwage ~exper , data = mroz ) summary (lm2) # multiple variable regression lm3 <- lm( lwage ~ exper + expersq + educ + age + kidslt6 + kidsge6 , data = mroz ) summary (lm3) # prediction newdata1 <- mroz [1:4,] predict (obj = lm3 , newdata = newdata1 , type = " response ")

  26. Question • How well does the estimated regression equation fit the data?

  27. Total Sum of Squares • The sum of squared deviations obtained by using the sample mean to estimate the value of quarterly sales for each restaurant in the sample. • Total sum of squares:

  28. Sum of Squares Due to Error • Let denote the estimated value of the dependent variable using the linear regression model. is called the i th residual (残差). • Sum of squares due to error:

  29. Sum of Squares Due to Regression • Measures how much the values on the estimated regression line deviate from . Note that the mean of is also . • Sum of squares due to regression:

  30. Y X

  31. R Square • Define coefficient of determination (R Square) It is the proportion of variation in Y explained by the regression.

  32. F Test • F test can be used to test for an overall significant relationship between the response variable and all of the explanatory variables. H0: b1 = … = bk = 0 H1: At least one bj(j=1,…,k)is not equal to zero. at least one of the independent variables z1,…,zk is linearly related to Y

More Related