1 / 33

( Policy ) research with confidential micro data

( Policy ) research with confidential micro data. Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in België Brussels, September 25 2009. Overview. Benefits of using linked longitudinal firm-level datasets International experience

dorian-chan
Download Presentation

( Policy ) research with confidential micro data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Policy) research withconfidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in België Brussels, September 25 2009

  2. Overview Benefits of using linked longitudinal firm-level datasets International experience Modes of access to confidential firm-level datasets

  3. Benefits of using firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modelling Policy evaluation

  4. Benefits of using firm-level data • Improving quality of statistics • Assessing quality of published stats • New uses for old data • Uncovering new collection methods and new data needs • Testing of theories at firm-level • Providing ‘moments’ for modelling • Policy evaluation

  5. Data Quality In-house use at National Stats Office (NSO): Consistency in x-sect and longitudinal Integration: top-down vs bottoms-up External users: quality improvement criteria Systematic learning from external users

  6. New uses for ‘old’ data • Linking of multiple sources • link NSO surveys to Business Register • cross-linking with other registers • Housing, transport, labor, tax • Linking with external surveys • Creation of new indicators from linked data • Gross Flows • Higher moments; Correlations • New disaggregations • Subsamples: region, industry, size, type

  7. New collection methods • Links to registers allowsformassimputation of small samples • Collection of data at ‘transactions’ site • New types of info fromlinking disparate sources • Example: linkedgeographic info for disaster planning.

  8. Uncovering data needs • Micro-level research reveals useful indicators • Employment gross flows (US/BLS) • Firm demographics (Eurostat) • Interactions with external researchers improves understanding of users needs at NSOs • Gaps in available data are identified through research

  9. Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Firm-level data now used in many fields: IO, Trade, Labor, Finance, Management, Organization, Macro Recent improvements in modelling heterogeneous firms Variation in costs (… of learning, transport, etc) Usually representative consumer, constant mark-up Application of econometric techniques (GMM, clever instruments) to cope with endogeneity Providing ‘moments’ for modelling Policy evaluation

  10. Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modelling Information drawn from linked longitudinal firm-level distributions can be used to calibrate models. Especially the ability to do cross-country comparisons is promising Policy evaluation

  11. Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modelling Policy evaluation Individual decision making units respond to policy Track decisions and outcomes from longitudinal micro data No need to infer result from movement in aggregate Identification requires a control group Implementation of policy differ across cells (locations, between types of units, or over time) Effect of policy differs across cells (ie highways affect transport-intensive firms) Cross-country comparisons for identification

  12. International Experience • History of micro data access: • Stats Norway: early 1970s • US Census: late 1980s • Typical attitude of NSO before allowing access • Micro data is too difficult, You can’t really do that with data, and, we don’t trust you to use the data, Absolute security is required • Well, maybe we can think of how to allow access…. • Now: At least 25 NSOs have facilities for micro data research • Also, they use the backbone as basis of statistical process: enormous gains in productivity

  13. International Experience • Situation in EU countries • Business Register, VAT register, SS register, Business Surveys • Some have on-site, others have remote access: • Fin, Swe, Dnk, UK, Nld, Slo, Est, • Some have excellent in house research: Fra • In other countries a variety of situations: ad-hoc sharing of data, on-site, trusted third part)

  14. Modes of access to confidential micro data Research shop within stats agency On-site facility with access rules for external researchers Secure remote-access for external researchers Remote execution Distributed micro data analysis how to share unsharable data

  15. Issues to consider Absolute certainty about confidentiality of data Uniqueness of published official statistics Requirements for access Resource cost sharing

  16. Confidentiality • Must weigh costs and benefits • What is ‘cost’ of confidential data being released • Relate to costs of not allowing access to data: Increasing irrelevance of stats agency and hopefully extreme budget cuts • Don’t just look at technical side of disclosure • What is likelihood of malice or fraud • Look at ease of getting same or better confidential data elsewhere

  17. Uniqueness • The ‘one published number’ view of stats agencies conflicts with reliability • We all know numbers don’t add up and that different assumptions generate different stats. So, openness, replicability, review, robustness testing by others will enhance reputation of stats agency publications • Research output can be labelled as such with a disclaimer

  18. Requirements for access • Create (legal) framework for allowing access by external researchers • Screening of projects and research teams • Special employee status • Create technical facilities • Database architecture • Meta data • On-site laboratory • Remote-access facilities

  19. Distributed Micro Data Research • Distributed Micro Data research was developed to allow cross-country research using confidential firm-level data that could not be combined • The key is to ‘micro-aggregate’ underlying micro data into cells that pass disclosure and • Provide enough information for further analysis, and/or • Can be merged at cell-level with other sources • DMD can be viewed as system to allow customer-driven publication of statistics • ‘Moments’ are useful for economic modelling

  20. DMD EUKLEMS+ Data for Cross-country Firm-level Analysis Longitudinal Micro Data National Accounts Industry Data Surveys, Business Registers Macro and Sectoral Timeseries Single country • SC LMD EUKLEMS N.A. Multiple countries

  21. Distributed Micro Data Analysis Policy Question Research Design Researcher Program Code Publication Metadata DMD Tables Research Network Network members Provision of metadata. Approval of access. Execution of Code Disclosure analysis of DMD tables. Disclosure analysis of Publication NSOs

  22. DMD Projects • OECD 2000-2003 • World Bank 2006 • Followup 2009-2011 • EU/NL 2007 • Eurostat ICT Impacts 2008-2009 • Followup 2010

  23. Analytical uses of DMD datasets • Creation of new indicators from linked data • Definition of cells based on complex longitudinal characteristics • e.g.Employer-employee matched • ‘Event’ studies (tracking sub-populations based on prior characteristics) • Indicators may be high-moments, correlations, regression coefficients, etc. • e.g. correlation of profitability and employee gender-ratio, by industry, region and time • Linking of outside data sources at cell-level • Generate custom tabulations of data to match cells of other published or DMD datasets • e.q. labor force gender-ratio by region and time • Cross-country analysis with panels with the same cell level definitions

  24. Uses of DMD for Policy Evaluation • Individual decision making units respond to policy • Track decisions and outcomes from longitudinal micro data • No need to infer result from movement in aggregate • Identification requires a control group • Implementation of policy differ across cells (locations, between types of units, or over time) • Effect of policy differs across cells (ie highways affect transport-intensive firms)

  25. Implementing efficient firm-level data analysis • Technical facilities • Meta-data libraries • Disclosure analysis and rules for re-use of extracted datasets

  26. Technical Facilities • Back-bones for universe of statistical units • Firms, Households, Dwellings, etc • Relational database organisation of data and meta-data • Statistical tools inside relational database programming environment • Remote access or remote execution • Remote access allows data visualisation, interactive data checking

  27. Meta-data • Ideal application of meta-data • Be able to write generic code remotely • Convert code to run locally, using meta-data • Meta-data set up to describe • available datasets • unique record identifiers • classifications • ‘economic variables’

  28. Necessary meta-data • list of available forms and schedules • info on record identifiers (Firm_id, person_id) • info on ‘economic variables’ • info on classifications • concordances between units • concordances between variables • concordances to standard classifications

  29. Underlying Metadata: datasources

  30. Underlying Metadata: variables in survey ECS_1999

  31. Underlying Metadata: classifications of domains ISICr3

  32. Underlying Metadata: Concordances IndC_ICTind

  33. Disclosure Analysis • Can be fairly automated, based on cell-count and ‘concentration’ • Further, rules may be instated about further use of DMD dataset. For example, requirement that dataset be erased after use will reduce worries about secondary disclosure. • Checking may also be required on final publication

More Related