340 likes | 542 Views
Part 1. Regression Analysis on Cross Sectional Data. Chap 2. The Simple Regression Model— Practice for learning multiple Regression. Bivariate linear regression model : the slope parameter in the relationship between y and x holding the other factors in u fixed.
E N D
Chap 2. The Simple Regression Model—Practice for learning multiple Regression • Bivariate linear regression model • : the slope parameter in the relationship between y and x holding the other factors in u fixed. • : the intercept parameter. • u: the error term or disturbance factors other than x that affect y, unobserved
Example: data-1 Cross-section data, 1832 rural households, in 2000 Y: consumption X: income scatter consum income scatter consum income || lfit consum income graph twoway (scatter consum income) (lfit consum income) reg consum income
More Discussion • :A one-unit change in x has the same effect on y, regardless of the initial value of x. • Can we draw ceteris paribus conclusions about how x affects y from a random sample of data, when we are ignoring all the other factors? --generally speaking, we can not!
Classical Regression Assumptions • Feasible assumption if the intercept term is included • Linearly uncorrelated zero conditional mean assumption • Then • Sample----Population
Attention: u: error term 误差项 : the fitted value : residuals 残差项
Example: data-1 Cross-section data, 1832 households Y: consumption X: income reg consum income predict consum1 generate v=consum-consum1
OLS • Total Sum of Squares (SST) =Explained Sum of Squares + Residual Sum of Squares • Coefficient of determination • the fraction of the sample variation in y that is explained by x. • Is R-squared important?
Example: data-1 reg consum income SSE= 2.3557e+11 SSR= 9.4745e+11 SST= 1.1830e+12 R-sq= 0.1991
Units of Measurement • If one of the dependent variables is multiplied by the constant c—which means each value in the sample is multiplied by c—then the OLS intercept and slope estimates are also multiplied by c. • If one of the independent variables is divided or multiplied by some nonzero constant, c, then its OLS slope coefficient is multiplied or divided by c, respectively. • How to judge the size of the coefficient? • The goodness-of-fit of the model, R-sq, should not depend on the units of measurement of our variables.
Function Form • Linear Nonlinear • When and Why to take logarithm? • When? (For $, GDP, Output, salary…) • Why? Make the variable normal looking. (income) • Why? Minimize heteroskedasticity. • Why? Elasticity • Bi-Logarithmic • Constant elasticity • U or Inverse-U shape • Theory? • Experience, age
Consumption will increase by 0.18 $ if income increase by 1 $; Consumption will increase by 106.03$ if income crease by 1%; Wage increases by 8.3% for every additional year of education; A 1% increase in firm sales increase CEO salary by about 0.257%.
Unbiasedness of OLS Estimators • Statistical properties of OLS • 从总体中随机抽样取出的不同样本的OLS估计的分布性质 • Assumption 1-4 • Linear in parameters (f. form) • Random sampling (nonrandom sampling) • Zero conditional mean (endogeneity; spurious cor) • Sample Variation in the independent variables (co-linearity) • Theorem 2.1 (Unbiasedness)
Variance of OLS Estimators • How far is away from ? • Assumption 5 • Homoskedasticity: • Error variance • A larger means that the distribution of the unobservables affecting y is more spread out. • Theorem 2.2 (Sampling variance of OLS estimators) Under the five assumptions above:
Variance of y given x • Conditional mean and variance of y: • Heteroskedasticity
What does depend on? • More variation in the unobservables affecting y makes it more difficult to precisely estimate • The more spread out is the sample of xi , the easier it is to find the relationship between E(y | x) and x • As the sample size increases, so does the total variation in the xi. Therefore, a larger sample size results in a smaller variance of the estimator, then higher significance.
Estimating Error Variance • Errors (Disturbances) and Residuals • Errors: , population • Residuals: , estimated f. • Theorem 2.3 (The unbiased estimator of ) • Under the five assumptions above, we have: • standard error of the regression (SER): • Estimating the standard deviation in y after the effect of x has been taken out. • Standard Error of :
Why do we need Assumption 5 ? 1. Simplify the variance calculation for estimators 2. For further statistic test on the significance of estimated parameters; OLS estimators has the smallest variance (Ch3)
Regression through the origin Example: tax and consumption STATA Command: reg y x, nocons