1 / 29

Experiences with multiple propensity score matching

Experiences with multiple propensity score matching. Jan Hagemejer & Joanna Tyrowicz University of Warsaw & National Bank of Poland. Plan. Standard solutions to the automatisation challenge Where they do not work ? Example of propensity score matching

marja
Download Presentation

Experiences with multiple propensity score matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiences with multiple propensity score matching Jan Hagemejer & Joanna Tyrowicz University of Warsaw & National Bank of Poland

  2. Plan • Standard solutions to the automatisation challenge • Wherethey do not work? Example of propensityscorematching • Using loops and globalfunctiontogether • Generating the resultssetsfor atypicalestimations. • Difficultieswithusingbootstrap (and obtainingresultssets) • Summarycomments … and some (hard learned) advices Jan Hagemejer & Joanna Tyrowicz

  3. The standard route • Problem: severalestimations of similar form + need to compareresults. • Threesimplesolutions: • Solution 1: bruteforce = sit & type (copy / paste from output) • Solution 2: useparmest(Roger Newson) ifestimations on simplecategories in data (limitations of „by” command) • Solution 3: useloops • outreg/outreg2 • nicelyformattedtables, • publication-ready, • in many formats, evendirectly to Word orLaTeX. • Note: ifyouneed nice summarystatistics, youcanuseoutsumeither with byorwithinloops Jan Hagemejer & Joanna Tyrowicz

  4. Where the problems come from? • 2nd and 3rd solutionworksonlywithregression-typeestimations • However, someproceduresareincompatible with pre-cookedsolutions • Need to report: • output of the procedure • samplepropertiesaftermatching • balancingproperties of matching • Problem1: actually, none of theseis in the typicaloutput • Problem2: we needit for manyestimationsloopedovermanyvariables and each one of themtakes a looooongtime Jan Hagemejer & Joanna Tyrowicz

  5. Detailed problem description • Analyse the effects of privatisation • Take two firms A and A’. Firm A gets privatized. Firm A’ does not get privatised (ever). Want to compare firms A and A’ each year before and after privatisation of firm A (in fact we are comparing private firms to privatized SOEs due to few SOEs left in the sample) • Observe what happens before and after the „event” of privatisation • E.g. firm A may be one year before privatisation in 1999 and firm B in 2006, so „event” is an anchor and time „runs” both ways. • Effects may be observed in many spheres: • E.g. profits, investments, international competitiveness, employment, productivity • Effects may be due to self-selection • E.g. only better firms are privatised, so difference in performance is not due to privatisation (there might be other effects why firms are privatised related to, for instance, budget presure). • Use propensity score matching to compare privatised firms to non-privatised firms Jan Hagemejer & Joanna Tyrowicz

  6. What we want to get: Jan Hagemejer & Joanna Tyrowicz

  7. Detailed problem description • Thus, in ourcase: • Many timeperiods (for each „time-to-anchor” a separateestimation) • Many variables (for eachvariableseparateoutcomes, but within one „anchor” the same balancingproperties) • Twoways of estimating: regular and bootstrapping (especially the lattermadethingscomplex) • Eachestimation: roughly 1.5-3.5 hours (big dataset) • Over a hundredestimations • To verify if matching is ok, need to checkbalancingproperties • Additionalpitfalls: • We neededsomestatistics for allestimations and theywere not in the return list • Moreprecisely: procedurecomputesthem to be able to produceoutput, but theywere not added to the return list by authors Jan Hagemejer & Joanna Tyrowicz

  8. Summary of the problems Our problem was quitespecific… BUT consisted of many general problems: • Loopstake a lot of time – need to findefficientways • Somethingscannot be obtainedfast => evenmorereasons to run itautomatically • Obtainingdatasets of the results we need (so-calledresultssets) • Gettingvisible data iftheyare not an output • Usinginvisible data • Gettingaroundwithbootstrap Jan Hagemejer & Joanna Tyrowicz

  9. The structure of our estimations Jan Hagemejer & Joanna Tyrowicz

  10. Howglobalfunctioncan be usefull?

  11. Time loop Usingtheglobalfunction for estimations • Ourapplication: observe the same firms back and forth from the moment of privatisation („anchor”) • „Anchors” happen in differentyears • But we canonlymatch on one dimension: hasorhas not the „anchor” • Conceptualsolution: uselags and forwards to getthe time dimension • Technical problem: manyoutcomevariables and de facto many loops • Technicalsolution: defineseparatelymatchingvariables and outputvariables global in=„capitalroaexport_statusetc…” MATCHING VARS! global out=„productivityemploymentefficiencyetc…” COMPARISON VARS! globaloutf=„forwards of $out” Jan Hagemejer & Joanna Tyrowicz

  12. Getting from results to „resultssets”

  13. Why (and what) do we need (in) the resultssets? • Why? • Most importantly: withoutresultssets we cannot • analysethechangesover time • decomposetheobserveddifferentials • If we do not do itautomatically, itwouldhave to be copiedmanuallyfromlogs – many estimations, many variables, etc • What ? Step 1: Find out the reality • Size of each of thethreegroups: treated, total and control (= matched) • Averagesinallthreegroups (medians, etc.) • Knowledgeifinfacttheyaredifferent (= test of thestatisticalsignificancebased on difference and standard error of thisdifference) • What? Step 2: find out, howgoodthefindingsarestatistically • Balancingproperties! Jan Hagemejer & Joanna Tyrowicz

  14. Eventloop Our solution to step 1 • Initialize the store for ourresultsetsusingpostfile. Index the resulttable with variablenames, years and otherthingsthat the codeloopsaround tempnamememhold postfilememholdindicesvariable_names_for_results • Start the big loop (event) forvalues d=6(1)18 { • Run pscore(needed for bootstrap) and subsequentlypsmatch psmatch2 d`d' our_pscore_`d', out($out $outf $outl) someoptions Jan Hagemejer & Joanna Tyrowicz

  15. Variablesloop Our solution to step 1 • Run pscore and psmatch psmatch2 d`d' our_pscore_`d', out($out $outf$outl) someoptions • Start the loop foreach out in $out $outf1 $outf2 { • Generatemeansand standard errors for treaded/matched/unmatched, usingoutput from psmatch (somemoreaboutthislater) localse_after=r(seatt_`out') • Post the `locals’ to the postfileusing post command in eachloopiteration Jan Hagemejer & Joanna Tyrowicz

  16. Specificloop Our solution to step 2 • For balancingproperties we need to usepstestoverall the matchingvariables pstest$in • In order to produce nice tables, we need to loopoverall the matchingvariables in $in and createsome ‚locals’ in memory to latersavethem as separatevariables: foreach in in $in { capturelocalbias_reduction=r(bired_`in') capturelocalpvalue_bef=r(pbef_`in') capturelocalpvalue_after=r(paft_`in') capture gen b_red_`in'=`bias_reduction' capture gen pval_ber_`in'=`pvalue_bef' capture gen pval_aft_`in'=`pvalue_after‚ } • Spit out everything to a spreadsheet(alternativelyyoucanusepostfileagain): outsheetb_red* pval* using stats_priv_`d', replace • Makesomegraphs and cleanup psgraph graphsavepriv_support_`d', replace drop b_red* pval* Jan Hagemejer & Joanna Tyrowicz

  17. „Missing statistics”

  18. Solving problem of „missing” statistics • Psmatchproduces nice tables with all the requiredstatistics. However, theyareonlyshown on the screen and vanishrightafterthat • Lookinto the „ado” file youareusing (procedure) • Throughoutthe file, therearecommands return scalarx=`somelocal’ • Sometimes – for clarity – scalarsaredroppedattheend of procedure • Yourpreferedstatistic (ifitisintheoutput, ithas to be atleast a local) wouldsimplyhave to have a locallikethattoo • Ifitdoes not – youcanalwaysgenerateitbased on yourpreferences and availablelocals => Modifytheoriginalado file Jan Hagemejer & Joanna Tyrowicz

  19. Solving problem of „missing” statistics – example 1 Original ado file – line 380 Modifiedado file – line 380 qui foreach v of varlist `varlist' { replace _`v' = . if _support==0 tempname m1t m0t u0u u1u att dif0 sum `v' if _treated==1, mean scalar `u1u' = r(mean) sum `v' if _treated==0, mean scalar `u0u' = r(mean) sum `v' if _treated==1 & _support==1, mean scalar `m1t' = r(mean) local n1 = r(N) sum _`v' if _treated==1 & _support==1, mean scalar `m0t' = r(mean) scalar `att' = `m1t' - `m0t' scalar `dif0' = `u1u' - `u0u‘ return scalar att = `att' return scalar att_`v' = `att‚ /no „return” of needed scalars/ qui foreach v of varlist `varlist' { replace _`v' = . if _support==0 tempname m1t m0t u0u u1u att dif0 … /all the same as earlier plus / return scalardiff = `dif0' return scalar diff_`v' = `dif0‘ return scalar mean0 = `u0u' return scalar mean0_`v' = `u0u‘ return scalar mean1 = `u1u' return scalar mean1_`v' = `u1u' Jan Hagemejer & Joanna Tyrowicz

  20. Solving problem of „missing” statistics – example 2 Original ado file – line 440 Modifiedado file – line 440 return scalar seatt = `stderr' return scalar seatt_`v' = `stderr' qui regress `v' _treated scalar `ols' = _b[_treated] scalar `seols' = _se[_treated] return scalarseatt = `stderr' return scalar seatt_`v' = `stderr' qui regress `v' _treated scalar `ols' = _b[_treated] scalar `seols' = _se[_treated] return scalarseols = `seols‘ return scalar seols_`v' = `seols' Jan Hagemejer & Joanna Tyrowicz

  21. Problemswithbootstrap

  22. Problemswithbootstrap • The psmatchproceduredoes not takeintoaccountwhencalculatingse’sthat the propensityscoreisestimated. A possiblesolition to thisis to usebootstrap. • Whatproblems with bootstrap? • Need to run itseparately for eachvariable (itbootstrapsonly one standard errorat a time) • Outputisgivenin a totallydifferent form • Ittakes a looong time • New piece of code for just BS standard errors => newvariableloopswithineach time loop Jan Hagemejer & Joanna Tyrowicz

  23. Problemswithbootstrap • Again, create the postfile • Run the actualbootstrap in loops (post results in everyiteration) foreach out in $out $outf1 $outf2 { use data, clear bootstrap r(att): psmatch2 d`d' our_pscore_`d', out(`out') someoptions matrix mat = e(b), e(se) /withoutthis, no resultssets/ svmat mat /convertmatrix to variables/ rename mat1 a`d'_diff_after_bs_`out‘/createmeaningfulnames/ rename mat2 a`d'_se_after_bs_`out‘ gen time_of_event=`d post `postfile’ indices (a`d'_diff_after_bs_`out‘) (a`d'_se_after_bs_`out‘) } postfileclose Jan Hagemejer & Joanna Tyrowicz

  24. Final steps • Mergefilesobtained from bootstrap on „anchor” (to have a completeresultssetwithineach „anchor” period) • Organise the data • Producetables and graphs (againinloops) • Write paper Jan Hagemejer & Joanna Tyrowicz

  25. The resulting graphs (1) • 6 figures showing levels for 3 groups (15 matches each) Jan Hagemejer & Joanna Tyrowicz

  26. The resulting graphs (2) • 6 figures showing the decomposition of the treated-unmatched difference (15 matches each) Jan Hagemejer & Joanna Tyrowicz

  27. The resulting graphs (3) • 6xn figures showing the „balanced panel” version for all variables of the treated-unmatched difference Jan Hagemejer & Joanna Tyrowicz

  28. Some advices we did not take at the right time  • Use „sample 10” for testingprocedures- saves a lot of time • Leaving mess is not usefulif we ever want to comeback • Yourmemorylastsshorterthanthat of savedfiles – describingdofilesreallyhelps • Loopsarebetterthancopy&paste – and less messy too • Beware of changes in STATA syntax (all the time…) Jan Hagemejer & Joanna Tyrowicz

  29. Thank you for your attention! Jan Hagemejer & Joanna Tyrowicz University of Warsaw and National Bank of Poland

More Related