360 likes | 937 Views
The ARGUS Software of the SDC-project Anco Hundepool Statistics Netherlands Washington, August 1999 Statistical Disclosure Control the balance between the need for (more and more) information and the privacy of the respondents Statistical Disclosure Control
E N D
The ARGUS Software of the SDC-project Anco Hundepool Statistics Netherlands Washington, August 1999
Statistical Disclosure Control • the balance between the need for (more and more) information and • the privacy of the respondents
Statistical Disclosure Control • Need for detailed micro data files • electronic publications • computing power of users • Need for more detailed tables But....!!!
Statistical Disclosure Control • Protection of privacy of respondents persons, enterprises, institutions • Respondents must be able to trust Statistical Offices! • Risks: • Intruders/ hackers • Accidental recognition • Advanced record linkage techniques
Statistical Disclosure Control • Produce ‘safe’ datafiles and tables • Apply data modification techniques • Preserve as much information Implemented in ARGUS!
Framework of developmentof ARGUS • SDC project • partly subsidised by EU (4th Framework) • Co-operation between The Netherlands, Italy (+Spain) and UK
General aims of SDC project • Methodological research in SDC • microdata, tables • concerning statistics, OR • geographical data • (general) SDC Software development • microdata (m-ARGUS) • tables (t-ARGUS)
SDC project members • Netherlands • CBS (ARGUS) • TU-Eindhoven (OR for microdata) • Italy • Istat (with Univ. of Rome)(Research/testing) • CPR-Padova (with Univ. Tenerife)(OR for tabular data)
SDC project members • UK • ONS (data) • Univ. Manchester (with Univ. of Southampton)(Research on SARs) • Univ. Of Leeds (Geographical data)
Main software developed in SDC-project • m-ARGUS (CBS and TUE) • micro data • t-ARGUS (CBS and CPR) • tabular data
Ideas of m-ARGUS • Intruder uses information of identifying variables (e.g. region, sex, age, education, occupation) to identify records. • This leads to the sensitive information
m-ARGUS • Levels of protection • public use files (PUF) • micro files for researchers (MUC)universities, contract etc. • safe-setting
Ideas of m-ARGUS • a list of combinations of identifying variables must checked • find value combinations that are unsafe • e.g. |a x b x c| <= threshold • threshold depends on level of protection • Public use files • Micro data for researchers (contract)
Ideas of m-ARGUS • eliminate the unsafe combinations • by global recoding (age -> agegroup, region -> province) • local suppression (imputing missings) • interactively/automatically • with minimum information loss (entropy)
m-ARGUS • For microdata • Developed in Borland C++ • Windows-95/98 • Version 3.0 last SDC-version • interactive/automatic global recoding • automatic local suppression
Features of m-ARGUS • can handle large microdata files • only tables derived from microdata are being used • flexible global recoding • options for automatic mix of global recoding and local suppression (TU Eindhoven)
Addit. features of m-ARGUS • Micro-aggregation • Top/Bottom coding • Rounding
m-ARGUS metadata microdata Generate tables Recoding schemes Global recoding Local suppression Micro aggregation Top/bottom coding Rounding Report metadata microdata
m-ARGUS input data • Data: Fixed format ASCII • Metadata • Name • Position • Missing values (2) • Identification level • Hierarchical coding • Codelist (opt.)
Using m-ARGUS • reading data file • generating tables • apply global recodes • local suppression • generate safe file • generate report
Ideas of t-ARGUS • identification of sensitive cellsusing e.g. dominance rule • at least n (e.g. 2) contributors to a cell • sum of largest 3 contributors >= 75%(one large contributor could recalculate the contribution of its competitor) • easy part
Ideas of t-ARGUS • Eliminate/protect sensitive cells(hard part) • by applying SDC techniques • table redesign • cell suppression • rounding • interactively and/or automatically • with minimum information loss (e.g. cell weights)
Ideas of t-ARGUS • cell suppression in tables with marginals • identify primary sensitive cells • protect primary cells by suppressing additional (secondary) cells to prevent recalculation (to some approximation) • with minimal information loss (CPR)
t-ARGUS • 3-D tables • interactive table redesign • primary & secondary cell suppression • optimisation routines for automatic cell suppression • rounding
t-ARGUS metadata microdata tabulation codelists redesign rounding suppression report Safe table
Features of t-ARGUS • Initial run through microdata • Determine also top k per cell ->sensitive cells • Table redesign possible without going back to microdata • Uses procedures for secondary cell suppression using state-of-the optimisation algorithms (CPR) • Prepared for linked tables
t-ARGUS • Data: fixed format ASCII • Meta data: • Variable name • Start. position • Field length • Status
t-ARGUS • Apply global recoding • Protect file with secondary suppression • Rounding • Safe table as ASCII or .WK1(plus report)
t-ARGUS • Version 2.0 final SDC-version • requires commercial OR-solver(Xpress by Dash, UK, 600 GBP)
Future / CASC • Computational Aspects of Statistical Confidentiality • New European project-proposal(2000-2002) • Extending ARGUS • New research • Additional joint USA/EU-project?
CASC-m • Concentration on business/economic data • microaggregation • PRAM • Noise-addition/ masking
CASC-t • Hierarchical tables • Linked tables • Optimal solution vz. heuristics • Different input formats
CASC-team • Statistics Netherlands • Istat (Italy) • ONS, Univ. Southampton, Manchester, London, Plymouth (UK) • Bundesambt, IAB (Germany) • Stat. Catalunya, Univ Tenerife (Spain)
Contact • Anco Hundepool • Statistics Netherlands • PO box 4000 • 2200 JM Voorburg • The Netherlands • email ahnl@krypton.vb.cbs.nl • fax: +31 70 3375990 • phone: +31 70 3375038