1 / 43

Statistics Workshop Specialized Models Spring 2009 Bert Kritzer

Statistics Workshop Specialized Models Spring 2009 Bert Kritzer. Inferring Causation. Regression suggests but does not prove causation!. Must check for spurious relationships Must have correct time ordering Must elimination of alternatives

swann
Download Presentation

Statistics Workshop Specialized Models Spring 2009 Bert Kritzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Workshop Specialized ModelsSpring 2009Bert Kritzer

  2. Inferring Causation Regression suggests but does not prove causation! • Must check for spurious relationships • Must have correct time ordering • Must elimination of alternatives • Must recognize the possibility of multiple causal processes functioning independently • “multi-conjunctional causation” • Must confront possible mutual causation • simultaneous equations • identification

  3. Quality of Law School Attended r Career Success Quality of Law School Attended IQ Career Success Ambition Spurious Correlation

  4. Regression and One Way Causation • Crucial element is eliminating alternative explanations • Need to include predictors representing alternatives in regression equation • hope that they are not statistically significant • hope that variable of interest remains significant after alternative explanations are included • Must still deal with • Proper time ordering • Form of the relationship (linear vs. nonlinear, interactions/conditional relationships)

  5. Quantity Price Simultaneous EquationsClassic Supply/Demand Problem Supply Demand

  6. Quantity Price Simultaneous EquationsThe Supply/Demand Blob Problem Demand Supply Observed Line

  7. Simultaneous EquationsThe “Identification” Problem Supply Equation Is Identified: Both Equations Are Identified:

  8. Quantity Price Simultaneous EquationsIdentify Supply Equation by Adding Advertising to Demand Equation Supply Demand, A=2 Demand, A=3 Demand, A=1

  9. Simultaneous EquationsNonlegal Example: Political Socialization .34 FF F .29 .60 .38 C .26 .40 MF M

  10. What Regression Can’t Do Regarding Causation • Sorting out “necessary” vs. “sufficient” conditions • Sorting out multiple causal processes leading to same result

  11. Time Series Data • Change over time tends to be incremental • Observation at time t is usually not independent of the observation at time t-1 • The revenue a company receives from a product at time t is correlated with the revenue at time t-1 • Although there can be “interruptions” to the basic pattern due to market changes (e.g., entrance of a competitor) • The issue presented is labeled “serial correlation” or “autocorrelation”

  12. Time Series • Observations are not independent over time • Biased estimates of standard errors • Tend to be too low • Diagnostics • Plots • Residuals • Specialized statistics (Durbin-Watson) • Solutions • Remove the correlation by transforming data (usually involves focusing on differences) • Incorporating time dependence into statistical model (MLE) • Specialized methods (“Box-Jenkins”)

  13. The Random WalkEight Random Walks

  14. Time SeriesA Random Walk? Number of Cases Decided by the Supreme Court with Signed Opinions

  15. Time SeriesFirst Differences Number of Cases Decided by the Supreme Court with Signed Opinions

  16. Time SeriesA Random Walk, 2? Number of IFP Petitions for Certiorari Filed the Supreme Court

  17. Time SeriesFirst Differences, 2 Number of IFP Petitions for Certiorari Filed the Supreme Court

  18. The Interrupted Time Series 1962-1976

  19. Interrupted Time SeriesTypes of Interventions

  20. Serially Correlated ErrorsThe Usual Conceptualization (1) “AR1” (2) where “AR2” (3)

  21. Autoregression (“AR”)The Echo Effect

  22. Moving Average (MA)

  23. Diagnosing Serial Correlation Issues • A wide variety of statistics are used to diagnose the presence and structure of serial correlation • Estimate ρ as shown in the previous slide • Statistic called “Durbin-Watson” • LaGrange Multiplier Test • Dickey-Fuller Test • Fitting models that take into account this correlation • “filter” the data to remove the correlation • fit models that explicitly include the lack of independence

  24. Example: First DifferencesTo Remove Serial Correlation

  25. Generalized First Differences

  26. Specialized Time Series Models • “ARIMA” (Box-Jenkins) Models • deals jointly with AR and MA effects • Models that do “seasonal adjustment” • Panel models • Cross-section time-series models

  27. Time Series ExamplePrice and Anti-Trust Jonathan B. Baker and Daniel L. Rubinfeld, Empirical Methods in Antitrust: Review and Critique, 1 Am. L. & Econ. Rev. 386, 393 (1999).

  28. The Selection Problem • grades/lsat example • “in-out” variable is itself a random variable • “selection” models • “censored” (“truncation”): Y=Y* if Y* > some constant C; otherwise Y=C (C usually is 0) • “tobit” • “self-selection” (“observed” vs. “not observed”): observe Y only if some other unobserved variable (the selection variable) exceeds some threshold (you only observe a 1 or 0, as in probit or logit) • “Heckman” models • “switching” models • compare to interaction models where key variable is not stochasticsuch as gender or race • switching model involves a variable which itself is stochastic and is determined by a causal mechanism • something like Heckman

  29. Heckman Selection ModelApplied to Sentencing • How to measure sentence? • How is probation or suspended sentence treated? • Do same factors affect in-out decision as affect length of incarceration? • Could race affect one, but not the other? • Could race have an opposite effect on length compared to in-out?

  30. Pennsylvania Study

  31. The Heckman Model Results

  32. Race Differences Darrell Steffensmeier & Stephen Demuth. Ethnicity and Judges' Sentencing Decisions: Hispanic-Black-White Comparisons, 39 Criminology 145-178 (2001)

  33. Multilevel Models • Data at several levels • Census • individual • household • residential building • Education • District • School • Classroom/teacher • Student

  34. Hierarchical Linear Model (HLM) • HLM is a method that is specifically designed to model data measured at multiple levels • It produces estimates of the effects of variables measured at each level in a way that takes into account how many actual measurements you have at each level • Running standard regression models fails to account for the different frequency of measurement in multi-level data • Might have 10,000 individuals but only 20 distinct measures of school characteristics

  35. HLM Model for School Achievement SOURCE: Sarah TheuleLubienski and Christopher Lubienski , School Sector and Academic Achievement: A Multilevel Analysis of NAEP Mathematics Data, 43 Am. Educ. Res. J. 651, 672-73 (2006)

  36. Conclusions I • There is usually no one correct way to do statistics • There are wrong ways • With large data sets, specific choices as to how to test hypotheses will not affect conclusions except at the margins • The “model” used can have substantial effects on the estimates of the values one obtains

  37. Conclusions II • Descriptive statistics can be used with samples and with populations • Inferential statistics can be used with samples and populations • With samples, one typically wants to draw inferences to the full population • With populations, one is asking with the pattern observed could be generated by a purely random process

  38. Conclusions IIIDescription • Univariate • central tendency, dispersion, shape • Bivariate/Multivariate • nature of relationship (e.g., slope in regression) • strength of relationship (“correlation”) • “proportional reduction in error” (PRE) • Model of relationship • form (linear, nonlinear)

  39. Conclusions IVTesting Hypotheses • Power • How wrong is the null hypothesis (i.e., how strong is the effect we are looking for)? • Possibility of error • Type I error: incorrectly rejecting null (seeing something that really isn’t there) • Type II error: incorrectly failing to reject a null (failing to see something that is there) • Must avoid “fallacy of affirming the consequent” (failing to reject the null does not mean the null is correct)

  40. Conclusions VIssues in Estimation • Estimation can be in the form of either • a single value (a point estimate), or • a range of values (interval estimate, confidence interval, margin of error) • In thinking about estimation, we need to consider • statistical bias • “efficiency” or variation

  41. Conclusion VIMethods of Estimation • Point estimation • Population equivalent (“method of moments”) • Minimize error (e.g., least squares) • Maximum likelihood • Interval estimation • “pivot method”: use probability theory to pivot around a point estimation • “bootstrap”: simulate repeated samples

  42. Conclusions VII THE BOTTOM LINE Statistics should be understood as Random Variables

  43. THE END

More Related