330 likes | 459 Views
( Policy ) research with confidential micro data. Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in België Brussels, September 25 2009. Overview. Benefits of using linked longitudinal firm-level datasets International experience
E N D
(Policy) research withconfidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in België Brussels, September 25 2009
Overview Benefits of using linked longitudinal firm-level datasets International experience Modes of access to confidential firm-level datasets
Benefits of using firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modelling Policy evaluation
Benefits of using firm-level data • Improving quality of statistics • Assessing quality of published stats • New uses for old data • Uncovering new collection methods and new data needs • Testing of theories at firm-level • Providing ‘moments’ for modelling • Policy evaluation
Data Quality In-house use at National Stats Office (NSO): Consistency in x-sect and longitudinal Integration: top-down vs bottoms-up External users: quality improvement criteria Systematic learning from external users
New uses for ‘old’ data • Linking of multiple sources • link NSO surveys to Business Register • cross-linking with other registers • Housing, transport, labor, tax • Linking with external surveys • Creation of new indicators from linked data • Gross Flows • Higher moments; Correlations • New disaggregations • Subsamples: region, industry, size, type
New collection methods • Links to registers allowsformassimputation of small samples • Collection of data at ‘transactions’ site • New types of info fromlinking disparate sources • Example: linkedgeographic info for disaster planning.
Uncovering data needs • Micro-level research reveals useful indicators • Employment gross flows (US/BLS) • Firm demographics (Eurostat) • Interactions with external researchers improves understanding of users needs at NSOs • Gaps in available data are identified through research
Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Firm-level data now used in many fields: IO, Trade, Labor, Finance, Management, Organization, Macro Recent improvements in modelling heterogeneous firms Variation in costs (… of learning, transport, etc) Usually representative consumer, constant mark-up Application of econometric techniques (GMM, clever instruments) to cope with endogeneity Providing ‘moments’ for modelling Policy evaluation
Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modelling Information drawn from linked longitudinal firm-level distributions can be used to calibrate models. Especially the ability to do cross-country comparisons is promising Policy evaluation
Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modelling Policy evaluation Individual decision making units respond to policy Track decisions and outcomes from longitudinal micro data No need to infer result from movement in aggregate Identification requires a control group Implementation of policy differ across cells (locations, between types of units, or over time) Effect of policy differs across cells (ie highways affect transport-intensive firms) Cross-country comparisons for identification
International Experience • History of micro data access: • Stats Norway: early 1970s • US Census: late 1980s • Typical attitude of NSO before allowing access • Micro data is too difficult, You can’t really do that with data, and, we don’t trust you to use the data, Absolute security is required • Well, maybe we can think of how to allow access…. • Now: At least 25 NSOs have facilities for micro data research • Also, they use the backbone as basis of statistical process: enormous gains in productivity
International Experience • Situation in EU countries • Business Register, VAT register, SS register, Business Surveys • Some have on-site, others have remote access: • Fin, Swe, Dnk, UK, Nld, Slo, Est, • Some have excellent in house research: Fra • In other countries a variety of situations: ad-hoc sharing of data, on-site, trusted third part)
Modes of access to confidential micro data Research shop within stats agency On-site facility with access rules for external researchers Secure remote-access for external researchers Remote execution Distributed micro data analysis how to share unsharable data
Issues to consider Absolute certainty about confidentiality of data Uniqueness of published official statistics Requirements for access Resource cost sharing
Confidentiality • Must weigh costs and benefits • What is ‘cost’ of confidential data being released • Relate to costs of not allowing access to data: Increasing irrelevance of stats agency and hopefully extreme budget cuts • Don’t just look at technical side of disclosure • What is likelihood of malice or fraud • Look at ease of getting same or better confidential data elsewhere
Uniqueness • The ‘one published number’ view of stats agencies conflicts with reliability • We all know numbers don’t add up and that different assumptions generate different stats. So, openness, replicability, review, robustness testing by others will enhance reputation of stats agency publications • Research output can be labelled as such with a disclaimer
Requirements for access • Create (legal) framework for allowing access by external researchers • Screening of projects and research teams • Special employee status • Create technical facilities • Database architecture • Meta data • On-site laboratory • Remote-access facilities
Distributed Micro Data Research • Distributed Micro Data research was developed to allow cross-country research using confidential firm-level data that could not be combined • The key is to ‘micro-aggregate’ underlying micro data into cells that pass disclosure and • Provide enough information for further analysis, and/or • Can be merged at cell-level with other sources • DMD can be viewed as system to allow customer-driven publication of statistics • ‘Moments’ are useful for economic modelling
DMD EUKLEMS+ Data for Cross-country Firm-level Analysis Longitudinal Micro Data National Accounts Industry Data Surveys, Business Registers Macro and Sectoral Timeseries Single country • SC LMD EUKLEMS N.A. Multiple countries
Distributed Micro Data Analysis Policy Question Research Design Researcher Program Code Publication Metadata DMD Tables Research Network Network members Provision of metadata. Approval of access. Execution of Code Disclosure analysis of DMD tables. Disclosure analysis of Publication NSOs
DMD Projects • OECD 2000-2003 • World Bank 2006 • Followup 2009-2011 • EU/NL 2007 • Eurostat ICT Impacts 2008-2009 • Followup 2010
Analytical uses of DMD datasets • Creation of new indicators from linked data • Definition of cells based on complex longitudinal characteristics • e.g.Employer-employee matched • ‘Event’ studies (tracking sub-populations based on prior characteristics) • Indicators may be high-moments, correlations, regression coefficients, etc. • e.g. correlation of profitability and employee gender-ratio, by industry, region and time • Linking of outside data sources at cell-level • Generate custom tabulations of data to match cells of other published or DMD datasets • e.q. labor force gender-ratio by region and time • Cross-country analysis with panels with the same cell level definitions
Uses of DMD for Policy Evaluation • Individual decision making units respond to policy • Track decisions and outcomes from longitudinal micro data • No need to infer result from movement in aggregate • Identification requires a control group • Implementation of policy differ across cells (locations, between types of units, or over time) • Effect of policy differs across cells (ie highways affect transport-intensive firms)
Implementing efficient firm-level data analysis • Technical facilities • Meta-data libraries • Disclosure analysis and rules for re-use of extracted datasets
Technical Facilities • Back-bones for universe of statistical units • Firms, Households, Dwellings, etc • Relational database organisation of data and meta-data • Statistical tools inside relational database programming environment • Remote access or remote execution • Remote access allows data visualisation, interactive data checking
Meta-data • Ideal application of meta-data • Be able to write generic code remotely • Convert code to run locally, using meta-data • Meta-data set up to describe • available datasets • unique record identifiers • classifications • ‘economic variables’
Necessary meta-data • list of available forms and schedules • info on record identifiers (Firm_id, person_id) • info on ‘economic variables’ • info on classifications • concordances between units • concordances between variables • concordances to standard classifications
Underlying Metadata: Concordances IndC_ICTind
Disclosure Analysis • Can be fairly automated, based on cell-count and ‘concentration’ • Further, rules may be instated about further use of DMD dataset. For example, requirement that dataset be erased after use will reduce worries about secondary disclosure. • Checking may also be required on final publication