1 / 23

Part 1. Regression Analysis on Cross Sectional Data

Part 1. Regression Analysis on Cross Sectional Data. Chap 2. The Simple Regression Model— Practice for learning multiple Regression. Bivariate linear regression model : the slope parameter in the relationship between y and x holding the other factors in u fixed.

rselby
Download Presentation

Part 1. Regression Analysis on Cross Sectional Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part 1. Regression Analysis on Cross Sectional Data

  2. Chap 2. The Simple Regression Model—Practice for learning multiple Regression • Bivariate linear regression model • : the slope parameter in the relationship between y and x holding the other factors in u fixed. • : the intercept parameter. • u: the error term or disturbance factors other than x that affect y, unobserved

  3. Example: data-1 Cross-section data, 1832 rural households, in 2000 Y: consumption X: income scatter consum income scatter consum income || lfit consum income graph twoway (scatter consum income) (lfit consum income) reg consum income

  4. More Discussion • :A one-unit change in x has the same effect on y, regardless of the initial value of x. • Can we draw ceteris paribus conclusions about how x affects y from a random sample of data, when we are ignoring all the other factors? --generally speaking, we can not!

  5. Classical Regression Assumptions • Feasible assumption if the intercept term is included • Linearly uncorrelated zero conditional mean assumption • Then • Sample----Population

  6. Attention: u: error term 误差项 : the fitted value : residuals 残差项

  7. Example: data-1 Cross-section data, 1832 households Y: consumption X: income reg consum income predict consum1 generate v=consum-consum1

  8. What is OLS ?(Minimize the sum of sq residuals) 拟合值与残差

  9. OLS • Total Sum of Squares (SST) =Explained Sum of Squares + Residual Sum of Squares • Coefficient of determination • the fraction of the sample variation in y that is explained by x. • Is R-squared important?

  10. Example: data-1 reg consum income SSE= 2.3557e+11 SSR= 9.4745e+11 SST= 1.1830e+12 R-sq= 0.1991

  11. Units of Measurement • If one of the dependent variables is multiplied by the constant c—which means each value in the sample is multiplied by c—then the OLS intercept and slope estimates are also multiplied by c. • If one of the independent variables is divided or multiplied by some nonzero constant, c, then its OLS slope coefficient is multiplied or divided by c, respectively. • How to judge the size of the coefficient? • The goodness-of-fit of the model, R-sq, should not depend on the units of measurement of our variables.

  12. Function Form • Linear Nonlinear • When and Why to take logarithm? • When? (For $, GDP, Output, salary…) • Why? Make the variable normal looking. (income) • Why? Minimize heteroskedasticity. • Why? Elasticity • Bi-Logarithmic • Constant elasticity • U or Inverse-U shape • Theory? • Experience, age

  13. Consumption will increase by 0.18 $ if income increase by 1 $; Consumption will increase by 106.03$ if income crease by 1%; Wage increases by 8.3% for every additional year of education; A 1% increase in firm sales increase CEO salary by about 0.257%.

  14. Unbiasedness of OLS Estimators • Statistical properties of OLS • 从总体中随机抽样取出的不同样本的OLS估计的分布性质 • Assumption 1-4 • Linear in parameters (f. form) • Random sampling (nonrandom sampling) • Zero conditional mean (endogeneity; spurious cor) • Sample Variation in the independent variables (co-linearity) • Theorem 2.1 (Unbiasedness)

  15. Variance of OLS Estimators • How far is away from ? • Assumption 5 • Homoskedasticity: • Error variance • A larger means that the distribution of the unobservables affecting y is more spread out. • Theorem 2.2 (Sampling variance of OLS estimators) Under the five assumptions above:

  16. Variance of y given x • Conditional mean and variance of y: • Heteroskedasticity

  17. What does depend on? • More variation in the unobservables affecting y makes it more difficult to precisely estimate • The more spread out is the sample of xi , the easier it is to find the relationship between E(y | x) and x • As the sample size increases, so does the total variation in the xi. Therefore, a larger sample size results in a smaller variance of the estimator, then higher significance.

  18. Estimating Error Variance • Errors (Disturbances) and Residuals • Errors: , population • Residuals: , estimated f. • Theorem 2.3 (The unbiased estimator of ) • Under the five assumptions above, we have: • standard error of the regression (SER): • Estimating the standard deviation in y after the effect of x has been taken out. • Standard Error of :

  19. Why do we need Assumption 5 ? 1. Simplify the variance calculation for estimators 2. For further statistic test on the significance of estimated parameters; OLS estimators has the smallest variance (Ch3)

  20. Regression through the origin Example: tax and consumption STATA Command: reg y x, nocons

More Related