Application 3: Estimating the Effect of Education on Earnings

Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9

Quick Asymptotics reminder… • In class: Not really about “proving” consistency or asymptotic bias in estimates • When appropriate, will mention these bias terms which are asymptotically zero but not zero in finite samples

What should you know? • What happens to something in it’s probability limit • That our estimates will, in the limit, as N goes to infinity, under regularity conditions

What you do not need to know • Behind these results are various theorems • Laws of Large Numbers for plims • Central Limit Theorems for asymptotic normality • Various mathematical conditions • e.g contiunuous mapping theorem • You do not have to know: • which theorems you are using • You do not have to be able to prove these results with the theorems

Bottom Line… • Understand the role of N→∞ • the mean of the sample mean is μ • The variance goes to zero • If something is scaled by (N)-2 can converge in distribution • So far, typically rely on concept of “bias” but in large samples, consistency is more useful term. • If bias is decreasing as sample is increasing, then worry less about it • If even in large samples, our estimate is not close to the true value, worry more about it

Today’s Lecture • Review Error component models • Fixed Effects • Random Effects • Application: Estimating the Effects of Education on Earnings • Difficulty in Causal Estimation • Within-family estimator • Some limitation of fixed effects

Error has different components • Suppose we had to estimate where • If unobserved factors are uncorrelated with X’s: can do OLS w/ robust standard errors or FGLS • If unobserved factors correlated with X’s, can include group-specific fixed effect

Fixed effects versus “Dummy Variables” • These are not mutually exclusive categories • Dummy variables are just a categorical variable that is zero sometimes and one sometimes • “control” variables, which have a direct meaning, may sometimes be dummy variables • Fixed Effects, which tell us something about the structure of our error term, are also dummy variables.

Motivation for today’s example… • Want to know why do people earn different amounts • Specifically, what are the returns, in terms of increased wages, for various investments people make • Most common labor improving investment: Education

Motivation-2 • Simple Linear regression first introduced by Mincer Index this by individual i in group j Experience: we’re going to include a quadratic specification which is most commonly used Measure of schooling: we’re going to use years of education

Basic Problem with estimating this • Lot’s of reasons why different people may invest at different levels of education • Some of those reasons are probably correlated with how much money a person would earn as well as how much they will invest in education • Unobserved “ability” • Family factors, such income, parental involvement, genetic stuff, etc.

How might these bias our estimates? • Let’s say what we want to estimate is: • Interpret higher f as something like family income or family investment • Recall the OVB formula—care about two things: • Correlation between f and y: probably positive • Correlation between f and S: positive

Why is OLS biased? Y S

How could we fix this? • Some of the unobserved differences that bias a cross-sectional comparison of education and earnings are based on family characteristics • Key Assumption: within families, these differences should be fixed. • Observe multiple individuals with exactly the same family effect, then we could difference out the group effect

Estimating Family Averages • Can look at differences within family effect • This of this as a different CEF for each family E[Yij -Yj | S, X, f] = a + b(Sij – Sj) + c(Xij – Xj) + d(X2ij – X2j) • The way we estimate this:

What makes this believable • No within family differences • Might be a problem with siblings generally • Parents invest differently • Cohort related differences—influence siblings differently • Different “inherited” endowment • More believable with identical twins

A twins sample • Collect data at the Twins festival in Twinsburg Ohio • Survey twins: • Are you identical? If both say yes—then included • Ever worked in past two years • Earnings, education, and other characteristics • Useful because also get two measures of shared characteristics, so can control for measurement error

Twins sample issues… • Sample at Twinsburg NOT a random sample of twins • Benefit: more likely to be similar because attendees are into their “twinness” • Cost: not necessarily generalizable, even to other twin • Attendees select segment of the population • Generally Richer, Whiter, More Educated, etc. • Worry about heterogeneity of effects across some of these categories

External validity • Twins may not be very comparable to other families—face different costs and benefits to schooling • Twinsburg sample not representative of twins • Maybe not even externally valid for twins • Worry that selection into sample will give us an estimate that is not consistent with the population average

Fixed effects (same as first difference w/ only two obs/family Control for avg. family schooling—”ability” measure No family effect, cross-section regression

Where’s the variation • Recall our estimating equation • If Sijis the same in both twins, no contribution to estimate of b • Only estimated off of twins who are different from each other in schooling investments

Correlation Matrix for Twins Education of twin 1, reported by twin1 Education of twin 1, reported by twin2 ALL of the identification for b comes from the 25% of twins who don’t have the same schooling

Measurement Error • Seems that twins not perfect at reporting each other’s schooling: 5-10% measurement error • May be generating a different bias • Can use instrumental variables to try to address this (more on this after we do Instrumental Variables methods) • Need to worry about Data Quality too, can’t just worry about OVB

Limitations of Fixed Effects • Relies on within variation • Not transparent what is generating that variation • The variation that’s left may be ‘random’ but may be limited in its external validity • Must be the case that there is NO within group variation AND homogeneous effects between groups (i.e. b the same across groups) • May be less believable if family inputs have non-linear effects on income or education

When have unobserved group effects can be two issues: Uncorrelated with X’s: OLS not efficient, can fix this with GLS Correlated with X’s: OVB, can include “fixed effects” Fixed effects, within-group differences, and deviation from means differences can all remove bias from unobserved group effect What did we learn today

Application: The effect of Schooling on wages Ability Bias Fixing this with “twins” and “siblings” models Next Class

Application 3: Estimating the Effect of Education on Earnings