150 likes | 242 Views
Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata Files for Research. Daniela Ichim. Dissemination of Microdata Files for Research Risk assessment Disclosure limitation Data quality Record linkage Data utility. Outline.
E N D
Community Innovation Survey:a Flexible Approach to the Dissemination of Microdata Files for Research Daniela Ichim
Dissemination of Microdata Files for Research Risk assessment Disclosure limitation Data quality Record linkage Data utility Outline
Confidentiality against Dissemination Disclosure scenarios Find the right balance!
IDENTIFYING VARIABLES Nace Nuts Size Turnover (TURN) (STRUCTURAL VARIABLES) CONFIDENTIAL VARIABLES Expenditures in innovation (RTOT, …) Number of patents, … (VARIABLES INVOLVED IN ANALYSES) Community Innovation Survey
Confounding Numerical Categorical A A … A k-anonymity safe unsafe
General risk function Distance between and Density around : • Given a threshold (on units) • Local Outlier Factor as a • measure of difference in density between • a unit and its nearest neighbours
Parameters • Cut-off point for density (LOF) • quantiles • automatic • Threshold - dissemination policy
Stratification variables Analysis by Nace Nace A all Nace
MFR Selective masking Disclosure limitation • k-anonymity • Nearest neighbour • Micro-aggregation on tails
Quality assessment Dissemination Confidentiality
Quality of the external database E D Risk measure assessment Record linkage Chambers of Commerce database
Record linkage a) 100% for enterprises with more than 250 employees
Information preservation Selective masking Data utility Only identifying and confidential variables were modified. Only records at risk were modified. The weights were not modified. weighted totals (coherence with the already published information) Information content analysis • Some statistical indicators were slightly modified: • variances
Original Selective masking Individual ranking Information content analysis Data utility Assessment of the perturbation impact on ratios likeRTOT/TURN
Confidentiality: Risk measure based on the k-anonymity principle Flexible a) continuous and categorical variables b) easy to implement c) consistent for extreme choices Data utility: Selective protection to achieve the k-anonymity Comparable dissemination: Control both risk of re-identification and information loss Conclusions QUALITY DIMENSIONS