340 likes | 355 Views
Learn about point and interval estimation in simple and multiple linear regression, hypothesis testing, and model fitting techniques. Understand classical assumptions, maximum likelihood estimators, and covariance matrices in regression analysis.
E N D
Lecture 26 Statistical Inference in Simple Linear Regression Multiple Linear Regression 1
Point Estimation 1 Suppose Armand’s managers want a point estimate of the mean quarterly sales for all restaurants located near college campuses with 10,000 students ( ), i.e., a point estimate for A point estimate is or 110,000 Yuan.
Point Estimation 2 Suppose Armand’s managers want a point estimate of the quarterly sales for an individual restaurant located near a college with 10,000 students ( ), i.e., a point estimate for . A point estimate is or 110,000 Yuan.
Interval Estimation 1 Suppose Armand’s managers want an interval estimate of the mean quarterly sales for all restaurants located near college campuses with 10,000 students ( ), i.e., an interval estimate for . The interval estimate is called a confidence interval.
Interval Estimation 2 Suppose Armand’s managers want an interval estimate of the quarterly sales for an individual restaurant located near a college with 10,000 students ( ), i.e., an interval estimate for . The interval estimate is called a prediction interval.
Question • Which interval is wider? Confidence interval or prediction interval?
Prediction Interval • Suppose that we want to predict a new Y value, which is independent of the observed data (x1,Y1),…,(xn,Yn), when we knew the corresponding value of . Point prediction:
Prediction Results from Minitab 新观测值的预测值 新观 拟合值 测值 拟合值 标准误 95% 置信区间 95% 预测区间 1 110.00 4.95 (98.58, 121.42) (76.13, 143.87) 新观测值的自变量值 新观 Student 测值 Population 1 10.0 Confidence Interval Prediction Interval
妇女工资和一些社会经济变量的关系 Source: T.A. Mroz (1987), The Sensitivity of an Empirical Model of Married Womens Hours of Work to Economic and Statistical Assumptions, Econometrica 55, 765-799. • lwage: log of wage • exper: actual labor mkt exper • expersq: exper2 • educ: years of schooling • age: woman's age in yrs • kidslt6: number of kids < 6 years • kidsge6: number of kids 6-18
Multiple Linear Regression • E.g. • E.g. • E.g. • The term linear refers to the fact that the expectation of is a linear function of the unknown parameters
Classical Assumptions • Linearity: • Normality: each variable Yi has a normal distribution. • Independence: the variables Y1,…,Yn are independent. • Homoscedasticity: the variables Y1,…,Yn have the same variance .
Maximum Likelihood Estimators • The likelihood function of and : • The values of that maximize the likelihood function will be the values that minimize So the M.L.E. of are the least square estimators. • Define The M.L.E. for is
Explicit Form of the Estimators • The design matrix • Also define
The set of k+1 normal equations (by setting j=0,…,k) can be written as: So the least square estimators (the M.L.E.s) are: • We can see that the estimators will be linear combinations of Y1,…,Yn, so they follow multivariate normal distribution.
Theorem. Suppose that Y is an n-dimensional random vector, for which the mean vector E(Y) and the covariance matrix Cov(Y) exist. Suppose also that A is a p*n matrix whose elements are constants, and that W is a p-dimensional random vector defined by W=AY. Then E(W)=AE(Y) and Cov(W)=ACov(Y)A’.
The vector has a multivariate normal distribution with mean vector and covariance matrix . • For j=0,…,k, , marginally, has a normal distribution with mean and variance . • For , .
Testing Hypotheses where is an unbiased estimator of . The level a0 test rejects H0 if
R analysis # put data in the current workplace , and read data mroz <- read.csv ('mroz.csv ') # an overview head ( mroz ) # some exploration lm1 <- lm( lwage ~educ , data = mroz ) summary (lm1) lm2 <- lm( lwage ~exper , data = mroz ) summary (lm2) # multiple variable regression lm3 <- lm( lwage ~ exper + expersq + educ + age + kidslt6 + kidsge6 , data = mroz ) summary (lm3) # prediction newdata1 <- mroz [1:4,] predict (obj = lm3 , newdata = newdata1 , type = " response ")
Question • How well does the estimated regression equation fit the data?
Total Sum of Squares • The sum of squared deviations obtained by using the sample mean to estimate the value of quarterly sales for each restaurant in the sample. • Total sum of squares:
Sum of Squares Due to Error • Let denote the estimated value of the dependent variable using the linear regression model. is called the i th residual (残差). • Sum of squares due to error:
Sum of Squares Due to Regression • Measures how much the values on the estimated regression line deviate from . Note that the mean of is also . • Sum of squares due to regression:
Y X
R Square • Define coefficient of determination (R Square) It is the proportion of variation in Y explained by the regression.
F Test • F test can be used to test for an overall significant relationship between the response variable and all of the explanatory variables. H0: b1 = … = bk = 0 H1: At least one bj(j=1,…,k)is not equal to zero. at least one of the independent variables z1,…,zk is linearly related to Y