Part 1. Regression Analysis on Cross Sectional Data

Part 1. Regression Analysis on Cross Sectional Data

Chap 2. The Simple Regression Model—Practice for learning multiple Regression • Bivariate linear regression model • : the slope parameter in the relationship between y and x holding the other factors in u fixed. • : the intercept parameter. • u: the error term or disturbance factors other than x that affect y, unobserved

Example: data-1 Cross-section data, 1832 rural households, in 2000 Y: consumption X: income scatter consum income scatter consum income || lfit consum income graph twoway (scatter consum income) (lfit consum income) reg consum income

More Discussion • :A one-unit change in x has the same effect on y, regardless of the initial value of x. • Can we draw ceteris paribus conclusions about how x affects y from a random sample of data, when we are ignoring all the other factors? --generally speaking, we can not!

Classical Regression Assumptions • Feasible assumption if the intercept term is included • Linearly uncorrelated zero conditional mean assumption • Then • Sample----Population

Attention: u: error term 误差项 : the fitted value : residuals 残差项

Example: data-1 Cross-section data, 1832 households Y: consumption X: income reg consum income predict consum1 generate v=consum-consum1

What is OLS ?(Minimize the sum of sq residuals) 拟合值与残差

OLS • Total Sum of Squares (SST) =Explained Sum of Squares + Residual Sum of Squares • Coefficient of determination • the fraction of the sample variation in y that is explained by x. • Is R-squared important?

Example: data-1 reg consum income SSE= 2.3557e+11 SSR= 9.4745e+11 SST= 1.1830e+12 R-sq= 0.1991

Units of Measurement • If one of the dependent variables is multiplied by the constant c—which means each value in the sample is multiplied by c—then the OLS intercept and slope estimates are also multiplied by c. • If one of the independent variables is divided or multiplied by some nonzero constant, c, then its OLS slope coefficient is multiplied or divided by c, respectively. • How to judge the size of the coefficient? • The goodness-of-fit of the model, R-sq, should not depend on the units of measurement of our variables.

Function Form • Linear Nonlinear • When and Why to take logarithm? • When? (For $, GDP, Output, salary…) • Why? Make the variable normal looking. (income) • Why? Minimize heteroskedasticity. • Why? Elasticity • Bi-Logarithmic • Constant elasticity • U or Inverse-U shape • Theory? • Experience, age

Consumption will increase by 0.18 $ if income increase by 1 $; Consumption will increase by 106.03$ if income crease by 1%; Wage increases by 8.3% for every additional year of education; A 1% increase in firm sales increase CEO salary by about 0.257%.

Unbiasedness of OLS Estimators • Statistical properties of OLS • 从总体中随机抽样取出的不同样本的OLS估计的分布性质 • Assumption 1-4 • Linear in parameters (f. form) • Random sampling (nonrandom sampling) • Zero conditional mean (endogeneity; spurious cor) • Sample Variation in the independent variables (co-linearity) • Theorem 2.1 (Unbiasedness)

Variance of OLS Estimators • How far is away from ? • Assumption 5 • Homoskedasticity: • Error variance • A larger means that the distribution of the unobservables affecting y is more spread out. • Theorem 2.2 (Sampling variance of OLS estimators) Under the five assumptions above:

Variance of y given x • Conditional mean and variance of y: • Heteroskedasticity

What does depend on? • More variation in the unobservables affecting y makes it more difficult to precisely estimate • The more spread out is the sample of xi , the easier it is to find the relationship between E(y | x) and x • As the sample size increases, so does the total variation in the xi. Therefore, a larger sample size results in a smaller variance of the estimator, then higher significance.

Estimating Error Variance • Errors (Disturbances) and Residuals • Errors: , population • Residuals: , estimated f. • Theorem 2.3 (The unbiased estimator of ) • Under the five assumptions above, we have: • standard error of the regression (SER): • Estimating the standard deviation in y after the effect of x has been taken out. • Standard Error of :

Why do we need Assumption 5 ? 1. Simplify the variance calculation for estimators 2. For further statistic test on the significance of estimated parameters; OLS estimators has the smallest variance (Ch3)

Regression through the origin Example: tax and consumption STATA Command: reg y x, nocons

Part 1. Regression Analysis on Cross Sectional Data

Part 1. Regression Analysis on Cross Sectional Data

Presentation Transcript

Cross-sectional study

Data Analysis: Regression

Cross Sectional Anatomy Part 2:

CROSS SECTIONAL STUDY

Cross-sectional study

Cross Sectional Designs

Cross-sectional Studies

Chapter 3 Naive Cross-Sectional Analysis of Longitudinal Data

Regression: Data Analysis

Analysis of Cross-Sectional Studies

Cross Sectional and Panel Data II

Multiple Regression Analysis: Part 1

Data Analysis Regression Analysis

Cross Sectional Anatomy

CROSS SECTIONAL STUDIES

Part 1: Regression Analysis Estimating Relationships

Welcome to our Presentation On Cross sectional analysis

Cross-sectional study

Cross-Sectional Studies

Data Analysis: Regression

Part 1: Regression Analysis Estimating Relationships

Cross Sectional Anatomy