1 / 49

Econometrics with Observational Data

Econometrics with Observational Data. Will begin at 2PM ET For conference audio, dial 800.767.1750 and use access code 45043 After entry please dial *6 to mute your line. Econometrics with Observational Data. Introduction and Identification Todd Wagner. Goals for Course.

rigg
Download Presentation

Econometrics with Observational Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Econometrics with Observational Data Will begin at 2PM ET For conference audio, dial 800.767.1750 and use access code 45043 After entry please dial *6 to mute your line.

  2. Econometrics with Observational Data Introduction and Identification Todd Wagner

  3. Goals for Course • To enable researchers to conduct careful analyses with existing VA (and non-VA) datasets. • We will • Describe econometric tools and their strengths and limitations • Use examples to reinforce learning

  4. Requirements • Familiarity with multivariate analysis • We expect you to • do the readings • ask questions if you don’t understand something • do your best

  5. Course faculty • Todd Wagner, PhD • Paul Barnett, PhD • Ciaran Phibbs, PhD • Mark Smith, PhD

  6. Course Dates • Introduction and identification: (March 12th) • Cost as the dependent variable (I): (March 26th) • Cost as the dependent variable (II): (April 9th) • Non-linear dependent variables: (April 30th) • Right-hand-side variables: (May 14) • Research design (May 28th) • Endogeneity and simultaneity: (June 11th)

  7. LiveMeeting Rules • Mute your phone • Don’t use the hold button If you have to make or answer another call, please hang up and then dial back in

  8. Features of web system • Main page • Slides, white board, applications demo • Side page • Chat • Questions and answers • Informal poll • Polls

  9. Virtual Interaction • Polls • Email me (or the presenter) with questions twagner@stanford.edu or todd.wagner@va.gov • If you are in a group setting, please email me the next two items.

  10. Randomized Clinical Trial • RCTs are the gold-standard research design for assessing causality • What is unique about a randomized trial? The treatment / exposure is randomly assigned • Benefits of randomization: Causal inferences

  11. Randomization • Random assignment distinguishes experimental and non-experimental design • Random assignment should not be confused with random selection • Selection can be important for generalizability (e.g., randomly-selected survey participants) • Random assignment is required for understanding causation

  12. Limitations of RCTs • Generalizability to real life may be low • Hawthorne effect (both arms) • RCTs are expensive and slow • Can be unethical to randomize people to certain treatments or conditions • Quasi-experimental design can fill an important role

  13. Elements of an Equation Maciejewski ML, Diehr P, Smith MA, Hebert P. Common methodological terms in health services research and their synonyms. Med Care. Jun 2002;40(6):477-484.

  14. Covariate, RHS variable, Predictor, independent variable Intercept Dependent variable Outcome measure Error Term

  15. “i” is an index.If we are analyzing people, then this typically refers to the person There may be other indexes

  16. Two covariates Intercept DV Error Term

  17. Different notation j covariates Intercept DV Error Term

  18. Error term • Error exists because • Other important variables might be omitted • Measurement error • Human indeterminancy • Understand and minimize error • Error can be additive or multiplicative See Kennedy, P. A Guide to Econometrics

  19. Example: is height associated with income?

  20. Y=income; X=height • Hypothesis: Height is not related to income (B1=0) • If B1=0, then what is B0? • To test Hypothesis, we need to estimate B1 with a sample of data

  21. Estimators

  22. OLS

  23. Other estimators • Least absolute deviations • Maximum likelihood

  24. Choosing an Estimator • Least squares • Unbiasedness • Efficiency (minimum variance) • Asymptotic properties • Maximum likelihood • We’ll talk more about estimators in courses 2-4.

  25. What is driving the error?

  26. What about gender • How could gender affect the relationship between height and income? • Gender-specific intercept • Interaction

  27. Gender Indicator Variable height Gender Intercept

  28. Gender-specific Indicator B2 B0

  29. Interaction gender height Interaction Term, Effect modification, Modifier Note: the gender “main effect” variable is still in the model

  30. Gender Interaction

  31. Classic Linear Regression (CLR) Assumptions

  32. Classic Linear Regression • No “superestimator” • CLR models are often used as the starting point for analyses • 5 assumptions for the CLR • Variations in these assumption will guide your choice of estimator (and happiness of your reviewers)

  33. Assumption 1 • The dependent variable can be calculated as a linear function of a specific set of independent variables, plus an error term • For example,

  34. Violations to Assumption 1 • Omitted variables • Non-linearities • Note: by transforming independent variables, a nonlinear function can be made from a linear function

  35. Testing Assumption 1 • Theory-based transformations • Empirically-based transformations • Common sense • Ramsey RESET test • Pregibon Link test Ramsey J. Tests for specification errors in classical linear least squares regression analysis. Journal of the Royal Statistical Society. 1969;Series B(31):350-371. Pregibon D. Logistic regression diagnostics. Annals of Statistics. 1981;9(4):705-724.

  36. Assumption 1 and Stepwise • Statistical software allows for creating models in a “stepwise” fashion • Be careful when using it. • Why? • Little penalty for adding a nuisance variable • BIG penalty for missing an important covariate

  37. Assumption 2 • Expected value of the error term is 0 E(ui)=0 • Violations lead to biased intercept • A concern when analyzing cost data (Wei will talk about the smearing estimator)

  38. Assumption 3 • IID– Independent and identically distributed error terms • Autocorrelation: Errors are uncorrelated with each other • Homoskedasticity: Errors are identically distributed

  39. Heteroskedasticity

  40. Violating Assumption 3 • Effects • OLS coefficients are unbiased • OLS is inefficient • Standard errors are biased • Plotting is often very helpful • Different statistical tests for heteroskedasticity • GWHet--but statistical tests have limited power

  41. Fixes for Assumption 3 • Transforming dependent variable may eliminate it • Robust standard errors (Huber White or sandwich estimators) • Wei and Ciaran will address this issue in more detail in later courses

  42. Assumption 4 • Observations on independent variables are considered fixed in repeated samples • E(xiui)=0 • Violations • Errors in variables • Autoregression • Simultaneity

  43. Assumption 4: Errors in Variables • Measurement error of dependent variable (DV) is maintained in error term. • Error in measuring covariates can be problematic • Is error correlated with error from DV?

  44. Common Violations • Including a lagged dependent variable(s) as a covariate • Contemporaneous correlation—often called endogeneity • Hausman test (but very weak in small samples) • Mark Smith will talk more about this.

  45. Assumption 5 • Observations > covariates • No multicollinearity • Solutions • Remove perfectly collinear variables • Increase sample size

  46. Any Questions?

  47. Statistical Software • SAS is for data management • R and Stata are for analyses • http://www.r-project.org/ • Stattransfer (Always transfer SCRSSN as double precision)

  48. Reading for Next Week • Maciejewski ML et al. Common methodological terms in health services research and their synonyms. Med Care. Jun 2002;40(6):477-484. • Manning WG, Mullahy J. Estimating log models: to transform or not to transform? J Health Econ. Jul 2001;20(4):461-494.

  49. Regression References • Kennedy A Guide to Econometrics • Greene. Econometric Analysis. • Wooldridge. Econometric Analysis of Cross Section and Panel Data.

More Related