260 likes | 359 Views
Methods of Economic Investigation: Lent Term: First Six Weeks. Alan Manning Office Hour: Tuesday 11.30-12.30 Office: R451. Administrative Details. 3 lectures per week for first 6 weeks all at 10am: Monday, 10-11, 1-2 (U8) Thursday, 10-11
E N D
Methods of Economic Investigation: Lent Term: First Six Weeks Alan Manning Office Hour: Tuesday 11.30-12.30 Office: R451
Administrative Details • 3 lectures per week for first 6 weeks all at 10am: • Monday, 10-11, 1-2 (U8) • Thursday, 10-11 • Odd Arrangement but combines previous theory and applied segments
What is Econometrics For? • To make life miserable for MSc students? • To impress your mother with the magic of idempotent matrices? • To provide credible answers to interesting questions?
Econometrics is a means to an end not an end in itself. • Two different types of ends (may be others) • Causal Effects • Forecasting • Causal effects are answers to ‘what if’ questions: • What would happen to smoking if cigarette taxes were raised? • Forecasting – just want best currently available predictors – don’t worry about causality
Emphasis on means to an end… • Recommended text – Stock and Watson – not very technical • Class exercises will contain practical work with real data • Number of purposes: • Makes concepts less abstract, easier to understand • Gives real-world skills • Gives insight into frustrations of empirical work: • Cute theory • Fantastic econometric methodology • Take it to the data and….
How to Estimate Causal Effects? • Want Effect of X on distribution of y, other relevant things being held constant • Most common to be interested in effect on mean of y, i.e.:
Estimation of linear regression offers promising approach • Can interpret regression function (Xβ) as estimate of E(y|X) • If conditional expectation linear in X then exact • If conditional expectation non-linear then Xβ linear approximation to true function • This is same as:
Proposition 1.1: If E(y|X)=Xβ, the OLS estimate is an unbiased estimate of β • Proof: Can write OLS estimator as: • If X is fixed we have that:
Problems with Inferring Causal Effects from Regressions • Regressions tell us about correlations but ‘correlation is not causation’ • Example: Regression of whether currently have health problem on whether have been in hospital in past year: • HEALTHPROB | Coef. Std. Err. t ------------+--------------------------------- PATIENT | .262982 .0095126 27.65 _cons | .153447 .003092 49.63 • Do hospitals make you sick? – a causal effect
General Problems in Estimating Causal Effects • Omitted Variables • Reverse Causality • Measurement Error • Sample selection
Omitted Variables (should be familiar) • Suppose we want to estimate E(y│X,W) assumed to be linear in (X,W), so that E(y│X,W) =Xβ+Wγ or: y =Xβ+Wγ+ε • But you estimate y=Xβ+u • i.e. E(y│X). Will have:
Form of Omitted Variables Bias • Where there is only one variable: • Extent of omitted variables bias related to: • - size of correlation between X and W • - strength of relationship between y and W
In hospital example… • Prior health status an obvious omitted variable: HEALTHPROB | Coef. Std. Err. t ------------+-------------------------------- PATIENT | .1250091 .0078147 16.00 HEALTHPROB1 | .6282796 .0061896 101.51 _cons | .0554544 .0026937 20.59
Reverse Causality/ Endogeneity • Idea is that correlation between y and X may be because it is y that causes X not the other way round • Interested in causal model: y=Xβ+ε • But also causal relationship in other direction: X=αy+u
Reduced form is: X=(u+αε)/(1-αβ) • X correlated with ε – know this leads to bias in OLS estimates • In hospital example being sick causes you to go to hospital – not clear what good solution is.
Measurement Error • Most (all?) of our data are measured with error. • Suppose causal model is: y=X*β+ε • But only observe X which is X* plus some error: X=X*+u • Classical measurement error: E(u│X*)=0
Can write causal relationship as: Y=Xβ-u β +ε • Note that X correlated with composite error • Should know this leads to bias/ inconsistency in OLS estimator • Can make some useful predictions about nature of bias – later on in course • Want E(y│X*) but can only estimate E(y│X)
Selection Effects • Following regression seems to show that women with young children earn more than those with older children: LOGWAGE | Coef. Std. Err. t --------+------------------------------- AGEKID0 | .0942016 .0083255 11.31 AGEKID1 | .1333421 .008284 16.10 AGEKID2 | .0833223 .0084401 9.87 AGEKID3 | .0526896 .0087102 6.05 AGEKID4 | .019879 .0087995 2.26 _cons | 1.808458 .0061696 293.12 • Is this sensible? – probably not
One explanation is sample selection • Only have earnings data on women who work • Women with small children who work tend to have high earnings (e.g. to pay for childcare) • Employment rates of mothers with babies is 28%, of those with 5-year olds is 50%:
Why is this – a brief exposition? • Causal model for everyone: y=Xβ +ε • But only observe if work, W=1, so estimate E(y|X,W=1) not E(y|X) • Sample selection bias if W correlated with ε – this is likely • Heckman got Nobel prize for working out how to deal with this – but not part of this course
Common Features of Problems • All problems have an expression in everyday language – omitted variables, reverse causality etc • All have an econometric form – the same one • A correlation of X with the ‘error’
How To Surmount the Problems? • More sophisticated econometric methods than OLS e.g. IV • Better data – Griliches: • “since it is the ‘badness’ of the data that provides us with our living, perhaps it is not at all surprising that we have shown little interest in improving it”
But Recent Trends • Much more emphasis on good quality data and research design than ‘statistical fixes’ – the ‘credibility revolution’ • Probably started in labour economics but now arriving in most fields • Will illustrate this in course through wide-ranging examples
Internal and External Validity • Estimates have internal validity if conclusions valid for population being studied • Estimates have external validity if conclusions valid for other popoulations e.g. can generalise impact of class size reduction in Tennessee in late 1980s to class size reduction in UK in 2005 – nothing in data will help with this
Choosing your data.. • Suppose interested in causal effect of X on y. • Can choose the way in which X is determined in your sample • may seem fanciful but field experiments becoming more common in economics • Good reason to choose to do randomized controlled experiment