330 likes | 454 Views
Beyond Statistical Significance: Using Stata Post-Estimation Procedures to Examine Substantive Effects. Garry Young George Washington Institute of Public Policy December 2, 2009. What’s So Great About Statistical Significance?.
E N D
Beyond Statistical Significance: Using Stata Post-Estimation Procedures to Examine Substantive Effects Garry Young George Washington Institute of Public Policy December 2, 2009
What’s So Great About Statistical Significance? • Statistical significance is crucially important in quantitative research • Tells you if a relationship likely occurred by chance (if analysis done correctly) • Tells you the direction (or sign) of the relationship • Can’t tell you the substantive significance or size of the relationship
What’s So Great About Statistical Significance? • Can’t tell you the substantive significance or size of the relationship • Did the s.s. covariate increase Pr(Y=1) by a lot or little? • With large Ns statistical significance is easier to get and thus more prone to finding trivial relationships “significant”
Substantive Effects in Research • Today journal reviewers, editors, and readers expect a consideration of substantive effects • authors often give it cursory treatment • Perhaps to hide trivial results • Perhaps because it can be computationally complex • Perhaps because we have no clear way to evaluate substantive significance • If a covariate increases Pr(Y=1) by 100% is that significant? • What if Pr(Y=1) without the covariate is .05 then the covariate doubles it to .10. Is that important?
Computational Complexity • In OLS determining the substantive effect is easy: A one-unit change in X produces a b -unit change in Y, holding other variables constant. • Non-linear estimators (Poisson, logit, ordered probit, etc.) pose far more difficulty. • Today’s statistical packages – especially Stata -- make it easy
Stata Options • Large number of post-estimating procedures in Stata for virtually all estimators • In Stata • . help postestimation commands • Extensive help • .search postestimation • Long list of available add-ons => Stata’s true strength as a program • S-Post • Clarify
S-Post • Suite of post-estimation commands • Substantive effects • Diagnostics (e.g., fit statistics) • Developed by Scott Long & Jeremy Freese • J. Scott Long and Jeremy Freese, 2005, Regression Models for Categorical Outcomes Using Stata. Second Edition. College Station, TX: Stata Press. • For more on S-Post: http://www.indiana.edu/~jslsoc/web_spost/sp_install.htm
Clarify • Suite of post-estimation software developed by Michael Tomz, Jason Wittenberg, and Gary King • There are different ways to install Clarify, here’s one:
Installing Clarify: Step 1 On an internet-connected machine type: findit clarify
Installing Clarify: Step 2 Then click
Installing Clarify: Step 3 Then click
What is Clarify? • Software that works within Stata • Uses Monte Carlo simulations to produce estimates of interest
Available Estimators Negative binomial regression (nbreg) Seemingly unrelated regression (sureg) • OLS (reg) • logit (logit) • probit (probit) • Ordered logit (ologit) • Ordered probit (oprobit) • Multinomial logit (mlogit) • Poisson regression (poisson)
Some Limitations • Hard to use with time-series estimators • Can’t handle TSCS estimators • E.g., xtreg, xtlogit, etc. • Can’t handle most types of survival analysis • E.g., stcox, streg • Stata 7 & earlier can do Weibull • Some diagnostics aren’t available • e.g., fitstat
Workaround for Some Diagnostics • In many cases • run the regular model outside of Clarify • do the diagnostic • then run the model in Clarify to get your substantive effects.
Take Fitstat as an example After running logit in Clarify, Fitstat returns an error
Fitstat Example Part 2 Run regular logit Then fitstat. If you seriously still want to run this model then run it now in Clarify.
The 3 Core Commands • estsimp • setx • simqi
estsimp • estsimp prefaces your model • Instead of: logit Y X1 X2… • It’s: estsimplogit Y X1 X2… • This tells Stata to use Clarify to estimate a logit model and simulate it’s parameters • Most options normally available with the estimator are available within Clarify • E.g., estsimplogit Y X1 X2 X3 if year == t • There are a few estsimp specific options, e.g., number of simulations to run or to run multiply-imputed datasets (more later)
setx • Use Setx to set the values of your explanatory variables. You have many options:
Simqi • Simqi returns Pr(Y=) or the expected value of Y (depending on the estimator) • Here, too, are many options for adjusting how simqi runs and the type of output produced
A Warning • Clarify derives its estimates from Monte Carlo simulations. • This means parameter estimates will vary slightly – usually very slightly. • Generally increasing the number of sims will negate differences • If you need exact replication you can set the random number seed to given number using the “set seed” command.
An Ordered Probit example • Constituency-orientation of 173 MPs in single-member district seats in Australia, Canada, New Zealand, and the UK (Heitshusen, Young, and Wood 2005). • D.V: Constituency Orientation: High (3), Medium (2), and Low (1) • RHS variables: electoral safety, portfolio, years in office, travel time to parliament, country dummies.
Substantive Effects • What if all Xs are at mean values? . setx mean . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .1782677 .0309813 .1224842 .2405272 Pr(conprior=2) | .2802904 .0381232 .2080904 .3550502 Pr(conprior=3) | .5414419 .0418051 .4566825 .6183931
Marginal MPs • MP at mean values except for safety. Setx is still at mean in memory so: . setx margin min . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .0881035 .0303765 .0387724 .1638489 Pr(conprior=2) | .2047718 .0385767 .1329525 .2831817 Pr(conprior=3) | .7071246 .058064 .5838024 .8131391
Safe MPs . setx margin max . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .475743 .1037975 .2744931 .6825318 Pr(conprior=2) | .2925022 .0459677 .1975205 .3819004 Pr(conprior=3) | .2317548 .0821582 .0960326 .4181555
How about the same thing as a first difference? . setx mean . simqi, fd(pr) changex(margin min max) First Difference: margin min max Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- dPr(conprior = 1) | .3876395 .119811 .1415692 .6151865 dPr(conprior = 2) | .0877304 .0322312 .0266251 .1543707 dPr(conprior = 3) | -.4753699 .1214275 -.6935345 -.2141026
Extensions • Lots you can do with simqi • Save predicted values and graph them, do first differences, etc. See Tomz, Wittenberg, and King (2001) or Clarify help in Stata for details.
Substantive Significance? • What’s up with those confidence intervals? . setx margin max . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .475743 .1037975 .2744931 .6825318 Pr(conprior=2) | .2925022 .0459677 .1975205 .3819004 Pr(conprior=3) | .2317548 .0821582 .0960326 .4181555
It tells about range and certainty • Take the statement from King, Tomz, and Wittenberg (2000): “Other things being equal, an additional year of education would increase your annual income by $1,500 on average, plus or minus $500.” • Contrast with: “Other things being equal, an additional year of education would increase your annual income by $1,500.” • Or: “There is a statistically significant relationship between education and income.”
Multiple Imputation • Clarify can work with Amelia (King et al 2001). • Amelia is another that works with Stata. It’s a multiple imputation program for addressing missing data. • I believe it will also work with Stata’s new multiple imputation procedure (mi) but I’ve not tried it.
References • Heitshusen, Valerie, Garry Young, and David Wood. 2005. “Electoral Context and MP Constituency Focus in Australia, Canada, Ireland, New Zealand, and the United Kingdom.” American Journal of Political Science 49: 32-45. • King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation, “ American Political Science Review 95: 49-69. • King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation,” American Journal of Political Science 44: 341-355. • Long, J. Scott and Jeremy Freese, 2005, Regression Models for Categorical Outcomes Using Stata. Second Edition. College Station, TX: Stata Press. • Tomz, Michael, Jason Wittenberg, and Gary King. 2001. “Clarify: Software for Interpreting and Presenting Statistical Results.” Manuscript, Stanford University. http://gking.harvard.edu/clarify/clarify.pdf.