90 likes | 194 Views
Are Public Use (Micro) Data a Thing of the Past?. John M. Abowd Cornell University US Census Bureau Prepared for IASSIST 2002. Yes …. If “public use” means distributed without any restrictions on the user If “micro data” means the actual responses of the sampled entities. Is this Heresy?.
E N D
Are Public Use (Micro) Data a Thing of the Past? John M. Abowd Cornell University US Census BureauPrepared for IASSIST 2002
Yes … • If “public use” means distributed without any restrictions on the user • If “micro data” means the actual responses of the sampled entities
Is this Heresy? • No, the ability of any data provider to protect the confidentiality of the respondent’s identity and data has become increasingly more difficult at the same rate as computation and data access have become increasingly easier. • This is just “Moore’s Law” as applied to the provision of data.
We Saw This Coming • Data providers have almost never provided public use micro data for samples of businesses. • The edits imposed on public use micro data from households have become increasingly severe. • Data providers with legal alternatives to public use releases have increasingly opted for licensing, restricted access and other access protocols.
Can Scientific Inquiry Survive? • Yes, provided the researchers and the archivists participate in the evolution of social data publication. • A more shaded understanding of what constitutes “public use (micro) data” can protect both the confidentiality of the respondents’ information and the integrity of the research analysis.
Example: American FactFinder • The public use product is an interface between the micro data (and the detailed summary data) and the user. • The confidentiality protection is provided as a part of the interface. • Advantage: the researcher can design the analysis (so, this is a public use micro data product) • Disadvantage: only analyses that can be handled by the confidentiality protection system are allowed.
Example: New Census Employment Dynamics Estimates • Estimates created by integrating data from employers and employees over time. • Public use products based on a confidentiality protection systems that fuzzes all of the underlying micro data. • Advantage: analysis can be performed at levels of geographic or industry detail that would be suppressed by traditional systems. • Disadvantage: some analyses are significantly distorted to protect the confidentiality of the micro data.
Example: INSEE Researcher Restricted Access • INSEE allows confidential micro data to be placed on secured facilities controlled by the researcher. • Advantage: analysis is performed on the unaltered micro data. • Disadvantage: other researchers must apply for access and create their own secure facility. • In the US, the NCES uses a similar system.
General Principles • Layers of confidentiality protection • “Gold standard” micro data • Housed in a secure facility with restricted access • Restricted micro data • Created by statistical manipulation of the confidential micro data • Suitable for licensed distribution • Public use products • Confidentiality protection integrated with an analysis engine allowing general research