Beyond Statistical Significance: Using Stata Post-Estimation Procedures to Examine Substantive Effects

Beyond Statistical Significance: Using Stata Post-Estimation Procedures to Examine Substantive Effects Garry Young George Washington Institute of Public Policy December 2, 2009

What’s So Great About Statistical Significance? • Statistical significance is crucially important in quantitative research • Tells you if a relationship likely occurred by chance (if analysis done correctly) • Tells you the direction (or sign) of the relationship • Can’t tell you the substantive significance or size of the relationship

What’s So Great About Statistical Significance? • Can’t tell you the substantive significance or size of the relationship • Did the s.s. covariate increase Pr(Y=1) by a lot or little? • With large Ns statistical significance is easier to get and thus more prone to finding trivial relationships “significant”

Substantive Effects in Research • Today journal reviewers, editors, and readers expect a consideration of substantive effects • authors often give it cursory treatment • Perhaps to hide trivial results • Perhaps because it can be computationally complex • Perhaps because we have no clear way to evaluate substantive significance • If a covariate increases Pr(Y=1) by 100% is that significant? • What if Pr(Y=1) without the covariate is .05 then the covariate doubles it to .10. Is that important?

Computational Complexity • In OLS determining the substantive effect is easy: A one-unit change in X produces a b -unit change in Y, holding other variables constant. • Non-linear estimators (Poisson, logit, ordered probit, etc.) pose far more difficulty. • Today’s statistical packages – especially Stata -- make it easy

Stata Options • Large number of post-estimating procedures in Stata for virtually all estimators • In Stata • . help postestimation commands • Extensive help • .search postestimation • Long list of available add-ons => Stata’s true strength as a program • S-Post • Clarify

S-Post • Suite of post-estimation commands • Substantive effects • Diagnostics (e.g., fit statistics) • Developed by Scott Long & Jeremy Freese • J. Scott Long and Jeremy Freese, 2005, Regression Models for Categorical Outcomes Using Stata. Second Edition. College Station, TX: Stata Press. • For more on S-Post: http://www.indiana.edu/~jslsoc/web_spost/sp_install.htm

Clarify • Suite of post-estimation software developed by Michael Tomz, Jason Wittenberg, and Gary King • There are different ways to install Clarify, here’s one:

Installing Clarify: Step 1 On an internet-connected machine type: findit clarify

Installing Clarify: Step 2 Then click

Installing Clarify: Step 3 Then click

What is Clarify? • Software that works within Stata • Uses Monte Carlo simulations to produce estimates of interest

Available Estimators Negative binomial regression (nbreg) Seemingly unrelated regression (sureg) • OLS (reg) • logit (logit) • probit (probit) • Ordered logit (ologit) • Ordered probit (oprobit) • Multinomial logit (mlogit) • Poisson regression (poisson)

Some Limitations • Hard to use with time-series estimators • Can’t handle TSCS estimators • E.g., xtreg, xtlogit, etc. • Can’t handle most types of survival analysis • E.g., stcox, streg • Stata 7 & earlier can do Weibull • Some diagnostics aren’t available • e.g., fitstat

Workaround for Some Diagnostics • In many cases • run the regular model outside of Clarify • do the diagnostic • then run the model in Clarify to get your substantive effects.

Take Fitstat as an example After running logit in Clarify, Fitstat returns an error

Fitstat Example Part 2 Run regular logit Then fitstat. If you seriously still want to run this model then run it now in Clarify.

The 3 Core Commands • estsimp • setx • simqi

estsimp • estsimp prefaces your model • Instead of: logit Y X1 X2… • It’s: estsimplogit Y X1 X2… • This tells Stata to use Clarify to estimate a logit model and simulate it’s parameters • Most options normally available with the estimator are available within Clarify • E.g., estsimplogit Y X1 X2 X3 if year == t • There are a few estsimp specific options, e.g., number of simulations to run or to run multiply-imputed datasets (more later)

setx • Use Setx to set the values of your explanatory variables. You have many options:

Simqi • Simqi returns Pr(Y=) or the expected value of Y (depending on the estimator) • Here, too, are many options for adjusting how simqi runs and the type of output produced

A Warning • Clarify derives its estimates from Monte Carlo simulations. • This means parameter estimates will vary slightly – usually very slightly. • Generally increasing the number of sims will negate differences • If you need exact replication you can set the random number seed to given number using the “set seed” command.

An Ordered Probit example • Constituency-orientation of 173 MPs in single-member district seats in Australia, Canada, New Zealand, and the UK (Heitshusen, Young, and Wood 2005). • D.V: Constituency Orientation: High (3), Medium (2), and Low (1) • RHS variables: electoral safety, portfolio, years in office, travel time to parliament, country dummies.

Oprobit Estimate of Constituency Orientation

Substantive Effects • What if all Xs are at mean values? . setx mean . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .1782677 .0309813 .1224842 .2405272 Pr(conprior=2) | .2802904 .0381232 .2080904 .3550502 Pr(conprior=3) | .5414419 .0418051 .4566825 .6183931

Marginal MPs • MP at mean values except for safety. Setx is still at mean in memory so: . setx margin min . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .0881035 .0303765 .0387724 .1638489 Pr(conprior=2) | .2047718 .0385767 .1329525 .2831817 Pr(conprior=3) | .7071246 .058064 .5838024 .8131391

Safe MPs . setx margin max . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .475743 .1037975 .2744931 .6825318 Pr(conprior=2) | .2925022 .0459677 .1975205 .3819004 Pr(conprior=3) | .2317548 .0821582 .0960326 .4181555

How about the same thing as a first difference? . setx mean . simqi, fd(pr) changex(margin min max) First Difference: margin min max Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- dPr(conprior = 1) | .3876395 .119811 .1415692 .6151865 dPr(conprior = 2) | .0877304 .0322312 .0266251 .1543707 dPr(conprior = 3) | -.4753699 .1214275 -.6935345 -.2141026

Extensions • Lots you can do with simqi • Save predicted values and graph them, do first differences, etc. See Tomz, Wittenberg, and King (2001) or Clarify help in Stata for details.

Substantive Significance? • What’s up with those confidence intervals? . setx margin max . simqi Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------- Pr(conprior=1) | .475743 .1037975 .2744931 .6825318 Pr(conprior=2) | .2925022 .0459677 .1975205 .3819004 Pr(conprior=3) | .2317548 .0821582 .0960326 .4181555

It tells about range and certainty • Take the statement from King, Tomz, and Wittenberg (2000): “Other things being equal, an additional year of education would increase your annual income by $1,500 on average, plus or minus $500.” • Contrast with: “Other things being equal, an additional year of education would increase your annual income by $1,500.” • Or: “There is a statistically significant relationship between education and income.”

Multiple Imputation • Clarify can work with Amelia (King et al 2001). • Amelia is another that works with Stata. It’s a multiple imputation program for addressing missing data. • I believe it will also work with Stata’s new multiple imputation procedure (mi) but I’ve not tried it.

References • Heitshusen, Valerie, Garry Young, and David Wood. 2005. “Electoral Context and MP Constituency Focus in Australia, Canada, Ireland, New Zealand, and the United Kingdom.” American Journal of Political Science 49: 32-45. • King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation, “ American Political Science Review 95: 49-69. • King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation,” American Journal of Political Science 44: 341-355. • Long, J. Scott and Jeremy Freese, 2005, Regression Models for Categorical Outcomes Using Stata. Second Edition. College Station, TX: Stata Press. • Tomz, Michael, Jason Wittenberg, and Gary King. 2001. “Clarify: Software for Interpreting and Presenting Statistical Results.” Manuscript, Stanford University. http://gking.harvard.edu/clarify/clarify.pdf.

Beyond Statistical Significance: Using Stata Post-Estimation Procedures to Examine Substantive Effects

Beyond Statistical Significance: Using Stata Post-Estimation Procedures to Examine Substantive Effects

Presentation Transcript

Estimation Procedures

Clinical, Practical or Mechanistic Significance vs Statistical Significance for POPULATION Effects

Statistical Significance

How to Begin Using Stata

Statistical Significance

Statistical significance using Confidence Intervals

Designing Substantive Procedures

Statistical Significance

Statistical procedures

Statistical Significance

Chapter 8: Using Basic Statistical Procedures

Statistical significance

Statistical Estimation

Stata statistical software

Statistical Significance

Statistical significance using p -value

Clinical, Practical or Mechanistic Significance vs Statistical Significance for POPULATION Effects

How to Begin Using Stata

Statistical Estimation

Stata statistical software

Statistical significance using p -value

Differentiating between statistical significance and substantive importance