210 likes | 233 Views
Explore comprehensive Stata routines developed by the World Bank for large-scale analytical work, aiding researchers in poverty assessments, labor market studies, and more. Automate economic analysis, minimize errors, and ensure reproducibility with ADePT solutions. Access new tools and methods easily, with a focus on efficiency and user-friendliness. Find out how ADePT V4.0 simplifies data processing and generates standardized results across various projects, benefiting users worldwide.
E N D
Development of large-scale applications with Stata Michael Lokshin, Sergiy Radyakin and Zurab Sajaia World Bank
Analytical work at the World Bank • Each year World Bank produces: • 10-15 poverty assessments • 5-10 Labor market studies • 10 Education and Health assessments • Gender studies • Nutritional Studies • Reports on Social protection and Benefit-Incidence analysis, etc. • Most analytical work for these reports is done in Stata • Research Department (DECRG) of the World Bank develops new methods and tools that are used in these reports and need to be make accessible to a wide audience of practitioners of applied economic analysis
Stata in the World Bank • Stata is the main statistical package used in the Bank • Hundreds of users both in the HQ and regional offices • Many users are short-term consultants with limited skills in Stata programming • Consultants are hired on a project and leave the Bank after the project is completed • Difficult to impose rules of a programming style, code documentation, archiving • Many Stata programs are lost or undocumented and are difficult to reuse • There is a need to automate the analytical work conducted in the Bank
Stata routines developed in DECRG • Poverty analysis toolkit: • Growth-inequality decomposition (gedecomposition.ado) • Sectoral poverty decomposition (sedecomposition.ado) • Growth-incidence curves (gicurves.ado) • Stochastic dominance analysis (pov_robust.ado) • egen extension for inequality and poverty measures • Fast algorithm for calculation of Gini coefficients (fastgini.ado) • Applied Economic Research: • FIML algorithm of two-equation ordered probit models with endogeneity • FIML estimation of the endogenous switching regression model • Selection models based on ordered probit • Semi-parametric difference-based estimation of partial linear regression models • Selecting a subset of variables providing the model’s best fit • Efficient estimation of regressions based on pseudo-panel data • LOOKFOR_ALL - an extention of a Stata program lookfor • xml_tab.ado: Saving the outputs from Stata estimation procedures in Microsoft Excel • usespss.ado; use10.ado – read SPSS files into Stata; read Stata 10 files in Stata 9. • Many other Stata routines
Automated Economic Analysis • Speed-up production of basic (required) results • Minimize human errors • To free resources for more meaningful and interesting tasks. • Easily introduce new techniques and methods • Allow easy replication of previous results • Generate standard, comparable results across the countries/years. • A tool for simulations • A tool for sensitivity analysis and training. • Helpful in situation of limited data access • Simple checking of previous reports/results • Minimize training time and skills requirements
ADePT: Software platform for automated economic analysis Request for computations Stata Computation Kernel ADePT User Interface Output in XLS or PDF format xml_tab.ado Version 3: Customized Stata dialogs, classes Version 4: User interface in C# Set of Stata and MATA routines; plug-ins ~100,000 lines of code Multiple version support Team Development
ADePT Solutions: • ADePT offers users a solution of a particular problem. • Modules of ADePT: set of analytical results (tables, graphs) sufficient to give an answer to a particular question. • Combination of software tools and the substantive contributions from the experts in a field. • Garry Fields (Cornell) : Labor • Martin Ravallion (WB) : Poverty • Adam Wagstaff (WB) : Health • Two main directions of ADePT: • Assessments of the current situation • Projections and simulations
ADePT V4.0 • Accepts individual-level and household data in Stata and SPSS format. Uses Stata for computations. • Possibility of remote computing • No prior knowledge of Stata is required • Minimal data preparation • Extensive checks on possible problems with the data • Control for influential outliers • Tested on the datesets from more than 50 countries: LSMS, HBS, DHS • Estimated 500 users in the WB, international research institutions, universities, government agencies. • Expected increase in the number of users when new modules are released
ADePTV4.0: The roadmap • ADePTPoverty: Public Release – June 2007 • ADePTMAPS: Public Release – October 2007 • ADePTLabor: Public Release – November 2007 • ADePTGender: Public Release – November 2008 • ADePTSocial Protection: Public release – June 2009 • ADePTEducation: Public Release – June 2009 • ADePTTargeting: Planned Release – August 2009 • ADePTPLINES: Development stage • ADePTHEALTH: Planned Release – August 2009 • ADePTInequality: Planned Release – August 2009
ADePT: Website www.worldbank.org/adept Download: installation and updates, documentation, examples.
Practical issues • Interface • Performance (-ftabstat2-) • Interaction/communication with other programs (IniFile.class, -smtp-) • Graphics (-twoway parea-, -amap-) • Custom file formats (-usespss-, -use10-) • Installation and updates (-pkg2script-) • Certification
Practical issues: Interface • Dialogs in Stata can be created to facilitate the use of custom written commands. But they are highly oriented on forming a command line: command with parameters and options, not the full application interface. • Some additional features were added in Stata 10 to expand the dialog possibilities, but they are still very limited, and we had a constraint to remain compatible with Stata9.2. • After exhausting standard dialogs features of Stata we decided to remove the interface part into an external application written in C# (Microsoft Visual Studio). Released version 3.0 of ADePT used Stata dialogs
Practical Issues: Interface Current version 4.0 of ADePT uses Windows forms for dialogs
Practical Issues: Performance • Stata’s built in routines seem to be very efficient, but the code implemented in *.ado files is often quite slow. • In particular, -tabstat- has shown inadequate performance for our tasks despite of its simple nature. • It was rewritten as a plugin -ftabstat2- in C++ (Microsoft Visual Studio) and modified to suit our particular needs: it now returns means, totals, counts, and various proportions matrices for each specified variable with support of by()-rows and by()-cols • Trade-off: no MP because plugins are (currently?) single-threaded.
Practical Issues: Communication Interaction/communication with other programs: we needed to solve two problems: • To provide an easy to handle job-file, which would contain the description of all the parameters and options for a large project (not possible to fit everything in command line). Transition from txt to ini-files. IniFiles.class • To provide communication between Stata and another program: while the computations are performed in Stata, the external interface part needs to be updated about the status of calculations. We solved this by writing a C++ plugin –smtp- (SendMessageToPipe), which utilizes Windows pipes for IPC
Practical Issues: Graphics • We have faced some limitations of the Stata graphics. Some of them were circumvented with custom graphics commands or adaptations of existing commands (-twoway parea-). • We didn’t find any way to interact with the mouse in Stata graphics (version 9.2). • We decided to move our mapping program –amap- out of Stata to external program and communicate with it seamlessly via ini-files. Demonstration only, not actual data
Practical Issues: File Formats • We needed to have a support of SPSS files in ADePT • We developed –usespss- plugin to import SPSS data to Stata • -usespss- was presented at SNASUG 2008 in Chicago and made available to the public immediately afterwards • We needed to provide Stata 9 users possibility to process datasets saved in Stata 10 format. • We developed (using Mata) a new command –use10- for this purpose. Available at SSC. http://repec.org/snasug08/radyakin_usespss.ppt findit usespss findit use10
Practical Issues: Installation and Updates • We have experienced problems with installing and updating packages from our web site into Stata. • The problem was not due to Stata, but we received a number of very helpful responses from the StataCorp’s Tech Support Team on this issue. • Effectively, this problem ruled out -net install- • We have developed a tool -pkg2script- to create autonomous installations from one or more Stata packages with the help of NSIS installation system. • The tool will work in Windows only; empty path – take package from SSC • In theory, all SSC could be packed into one distributive like the one shown here:
Practical Issues: Certification • We have faced the problem of verification of results. Checking the numbers by hand is long and unreliable. • We have included a test-mode for ADePT, where it: • launched from an external application (tests manager), • runs requested jobs, and • verifies the output against a predefined set of benchmarks, which were verified (confirmed by non-team members). • We monitor: whether the test succeeds (results are produced), whether the results are correct, and what time does it take to produce them. If the benchmark for the current test does not exist, ADePT will generate them from the current results, and verify against this saved output next time.
Practical Issues: Wishes for Stata12 • Access to registry (at least read-only) to detect presence of other programs, their versions, and location. (Currently solved with a plugin). • IPC – pipes (currently solved with a plugin). • Preserve/restore to RAM (currently solved with a RAMDrive). • Extend plugins possibilities: allow execute commands like Mata can do it: stata(“command”). • Support of Cyrillics/Local fonts • Unicode??