1 / 35

The ARGUS Software of the SDC-project

The ARGUS Software of the SDC-project Anco Hundepool Statistics Netherlands Washington, August 1999 Statistical Disclosure Control the balance between the need for (more and more) information and the privacy of the respondents Statistical Disclosure Control

jana
Download Presentation

The ARGUS Software of the SDC-project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ARGUS Software of the SDC-project Anco Hundepool Statistics Netherlands Washington, August 1999

  2. Statistical Disclosure Control • the balance between the need for (more and more) information and • the privacy of the respondents

  3. Statistical Disclosure Control • Need for detailed micro data files • electronic publications • computing power of users • Need for more detailed tables But....!!!

  4. Statistical Disclosure Control • Protection of privacy of respondents persons, enterprises, institutions • Respondents must be able to trust Statistical Offices! • Risks: • Intruders/ hackers • Accidental recognition • Advanced record linkage techniques

  5. Statistical Disclosure Control • Produce ‘safe’ datafiles and tables • Apply data modification techniques • Preserve as much information Implemented in ARGUS!

  6. Framework of developmentof ARGUS • SDC project • partly subsidised by EU (4th Framework) • Co-operation between The Netherlands, Italy (+Spain) and UK

  7. General aims of SDC project • Methodological research in SDC • microdata, tables • concerning statistics, OR • geographical data • (general) SDC Software development • microdata (m-ARGUS) • tables (t-ARGUS)

  8. SDC project members • Netherlands • CBS (ARGUS) • TU-Eindhoven (OR for microdata) • Italy • Istat (with Univ. of Rome)(Research/testing) • CPR-Padova (with Univ. Tenerife)(OR for tabular data)

  9. SDC project members • UK • ONS (data) • Univ. Manchester (with Univ. of Southampton)(Research on SARs) • Univ. Of Leeds (Geographical data)

  10. Main software developed in SDC-project • m-ARGUS (CBS and TUE) • micro data • t-ARGUS (CBS and CPR) • tabular data

  11. Ideas of m-ARGUS • Intruder uses information of identifying variables (e.g. region, sex, age, education, occupation) to identify records. • This leads to the sensitive information

  12. m-ARGUS • Levels of protection • public use files (PUF) • micro files for researchers (MUC)universities, contract etc. • safe-setting

  13. Ideas of m-ARGUS • a list of combinations of identifying variables must checked • find value combinations that are unsafe • e.g. |a x b x c| <= threshold • threshold depends on level of protection • Public use files • Micro data for researchers (contract)

  14. Ideas of m-ARGUS • eliminate the unsafe combinations • by global recoding (age -> agegroup, region -> province) • local suppression (imputing missings) • interactively/automatically • with minimum information loss (entropy)

  15. m-ARGUS • For microdata • Developed in Borland C++ • Windows-95/98 • Version 3.0 last SDC-version • interactive/automatic global recoding • automatic local suppression

  16. Features of m-ARGUS • can handle large microdata files • only tables derived from microdata are being used • flexible global recoding • options for automatic mix of global recoding and local suppression (TU Eindhoven)

  17. Addit. features of m-ARGUS • Micro-aggregation • Top/Bottom coding • Rounding

  18. m-ARGUS metadata microdata Generate tables Recoding schemes Global recoding Local suppression Micro aggregation Top/bottom coding Rounding Report metadata microdata

  19. m-ARGUS input data • Data: Fixed format ASCII • Metadata • Name • Position • Missing values (2) • Identification level • Hierarchical coding • Codelist (opt.)

  20. Using m-ARGUS • reading data file • generating tables • apply global recodes • local suppression • generate safe file • generate report

  21. t-ARGUS

  22. Ideas of t-ARGUS • identification of sensitive cellsusing e.g. dominance rule • at least n (e.g. 2) contributors to a cell • sum of largest 3 contributors >= 75%(one large contributor could recalculate the contribution of its competitor) • easy part

  23. Ideas of t-ARGUS • Eliminate/protect sensitive cells(hard part) • by applying SDC techniques • table redesign • cell suppression • rounding • interactively and/or automatically • with minimum information loss (e.g. cell weights)

  24. Ideas of t-ARGUS • cell suppression in tables with marginals • identify primary sensitive cells • protect primary cells by suppressing additional (secondary) cells to prevent recalculation (to some approximation) • with minimal information loss (CPR)

  25. t-ARGUS • 3-D tables • interactive table redesign • primary & secondary cell suppression • optimisation routines for automatic cell suppression • rounding

  26. t-ARGUS metadata microdata tabulation codelists redesign rounding suppression report Safe table

  27. Features of t-ARGUS • Initial run through microdata • Determine also top k per cell ->sensitive cells • Table redesign possible without going back to microdata • Uses procedures for secondary cell suppression using state-of-the optimisation algorithms (CPR) • Prepared for linked tables

  28. t-ARGUS • Data: fixed format ASCII • Meta data: • Variable name • Start. position • Field length • Status

  29. t-ARGUS • Apply global recoding • Protect file with secondary suppression • Rounding • Safe table as ASCII or .WK1(plus report)

  30. t-ARGUS • Version 2.0 final SDC-version • requires commercial OR-solver(Xpress by Dash, UK, 600 GBP)

  31. Future / CASC • Computational Aspects of Statistical Confidentiality • New European project-proposal(2000-2002) • Extending ARGUS • New research • Additional joint USA/EU-project?

  32. CASC-m • Concentration on business/economic data • microaggregation • PRAM • Noise-addition/ masking

  33. CASC-t • Hierarchical tables • Linked tables • Optimal solution vz. heuristics • Different input formats

  34. CASC-team • Statistics Netherlands • Istat (Italy) • ONS, Univ. Southampton, Manchester, London, Plymouth (UK) • Bundesambt, IAB (Germany) • Stat. Catalunya, Univ Tenerife (Spain)

  35. Contact • Anco Hundepool • Statistics Netherlands • PO box 4000 • 2200 JM Voorburg • The Netherlands • email ahnl@krypton.vb.cbs.nl • fax: +31 70 3375990 • phone: +31 70 3375038

More Related