200 likes | 357 Views
Statistical Disclosure Control (SDC) at SURS. Andreja Smukavec General Methodology and Standards Sector . Why is confidentiality protection needed?.
E N D
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector
Why is confidentiality protection needed? • One of the fundamental principles of official statistics is that statistical information of data suppliers is strictly confidential, and is used only for statistical purposes. • Legislation places a legal obligation on NSIs to protect data suppliers. • Data suppliers should have confidence in the NSI to preserve the confidentiality of individual information – better quality of the collected data.
National legislation • National Statistics Act • Data published in aggregated form. • Data may be published individually if • written consent of reporting units is obtained; • data are collected from public data collections; • data are published in such a way that the reporting units cannot be directly identified. • The Office or authorized producers shall transmit individual data to users on the basis of a written application. • Other legislation • Personal Data Protection Act; • …
European legislation • European Regulation (EC) No 223/2009 • General definitions; • Chapter 5 – Statistical Confidentiality • Access to confidential data for scientific purposes • European Statistics Code of Practice • Principle 5: The confidentiality of the information thedata providers provideand its use only for statistical purposes are absolutely guaranteed.
What does SDC cover at SORS? • Tabular data protection • Publication • Eurostat and other institutions • Users‘ requests • Microdata protection • Preparation of public-use files and scientific-use files • Checking rules set up by Eurostat • Output checking
Tabular data protection • Tables – aggregated data • Magnitude tables Sum of quantitative variable of respondents, where respondents are grouped by categorical variables. • Frequency tables Number of respondents, where respondents are grouped by categorical variables.
Tabular data protection at SURS • Method Cell Suppression • Post-tabular method • Non-perturbative method (less information available) • Implemented in Tau-Argus software (CASC project) • The interval of possible values for each sensitive cell is sufficiently large
Cell Suppression • Sensitivity rules – defining unsafe cells • Threshold The number of respondents in a cell is below a certain threshold value. • Concentration rules One or two respondents are dominant. • Group disclosure All respondents in one cell have the same value for a sensitive variable.
Cell Suppression • Secondary suppression • Needed due to sums in the tables. The feasibility interval for each unsafe cell has to be wide enough. • Optimisation problem -> LP-solver used (XPress, CPlex).
Microdata protection • Microdata are deindividualized pieces of information for individual units (enterprises, persons, households). • no direct identifiers(ID numbers, TAX numbers, name + address…) • Microdata files are available to our researchers in the secure room and via remote access.
Microdata protectionScientific-use file (SUF) • Prepared for researchers • Signed contract • Usually sent by CD + password, has to be destroyed after usage • More information (variables) available • Only unintentional disclosures are protected
Microdata protectionPublic-use file (PUF) • Publicly available or after registration • Less information (variables) available • All microdata protection methods are NOT usable (too complex for normal users) • Intentional disclosures are protected
Microdata protection • The goal of microdata protection is to make a safe microdata file, where • disclosure risk is low; • analyses done on a safe file have to give results which are close or equal to results of analyses done on original data.
Microdata protection methods, used at SURS • Modifying original microdata file, done by • non-perturbative methods: • global recoding; • top and bottom coding; • local suppression (not very usable for PUFs). • some perturbative methods: • microaggregation; • rounding. • Software packages Mu-Argus and R.
Labour Force Survey - PUF • Prepared for Social Data Archives (DwB project). • We used Eurostat‘s rules for creating SUF and by method sampling created PUF (one third of original sample). • We didn‘t use local suppression. • The quality of statistics used as parameters for method sampling is ensured, other should be used with precaution.
Output checking • Researchers fill out our form after finishing work. • An e-mail is sent to our common e-mail address zascas.surs@gov.si. • One of the SDC methodologists checks the output. In case of disclosive data or incorrectly filled form, the researchers are contacted for additional information or to correct the output. • After the SDC methodologist agrees with the dissemination, the output is sent to the researcher by e-mail.
Rules for output checking • Rule-of-thumb model • Threshold N – all tabular and similar output should have at least N units. • Dominance rule – the analysis should not be done on groups with a dominant unit. • Maximum and minimum are usually not released (exception if they refer to more than one unit). • 100% percentile is usually not released (maximum).
Thank you for your attention! zascas.surs@gov.si