1 / 27

October 2011

The secret lives of us: d ata confidentiality. October 2011. Linda Fardell Cross Portfolio Data Integration Secretariat. What is it & why should you care?. It’s about obligations – legal/ethical Aim – protect identity and release useful data It’s more than removing name & address

clio
Download Presentation

October 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The secret lives of us: data confidentiality October 2011 Linda FardellCross Portfolio Data Integration Secretariat

  2. What is it & why should you care? • It’s about obligations – legal/ethical • Aim – protect identity and release useful data • It’s more than removing name & address • Trust of providers is essential to get good stats

  3. Information is power • Banker in Maryland obtained a list of patients with cancer • compared with list of clients with outstanding loans • called in the loans of clients with cancer. Source: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy (Statist. Surv. Volume 5 (2011), 1-29.

  4. Legislative obligations • Privacy Act • Specific legislation governing collection & use of information e.g. • Social Security (Administration) Act 1999 • Taxation Administration Act 1953

  5. Other obligations • Principles based obligations e.g. High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes

  6. How agencies meet these obligations • Implement procedures to address all aspects of data protection • To ensure that identifiable information: • is not released publicly; • is available on a ‘need to know’ basis; • can’t be derived from disseminated data; and • is maintained and accessed securely.

  7. Managing identification risk Understand your obligations Establish policies and procedures De-identify the data Assess potential identification risks Test and evaluate to mitigate risks Manage the risks of identification - confidentialise Provide safe access to data

  8. Access to other information • Keep track of all information released from the dataset.

  9. When should a cell be confidentialised? • Common confidentiality rules: • frequency (threshold) rule • cell dominance (cell concentration) rule • Keep specific confidentiality procedures secret(e.g. the particular value chosen when applying the threshold rule)

  10. Two general methods • Data reduction • Data modification (perturbation)

  11. Example: frequency rule - 5 Before

  12. Example: cont. After

  13. Alternative: concealing totals

  14. E.g. 2 – the cell dominance (n,k) rule • Cell unsafe if combined contributions of the ‘n’ largest members of the cell represent more than ‘k’% of the total value of the cell • n & k values are set by data custodian • Example: (2, 75) rule • A & B contribute 81% of total profit, so profit needs protecting

  15. Data modification methods Before roundingRR3

  16. Data modification methods After rounding RR3

  17. Microdata • Valuable resource • 2 key types of disclosure risk: • spontaneous recognition • deliberate (malicious) attempt

  18. Microdata – managing risks • confidentialising • deterrents • restricting access • educating data users about their obligations • safe environment for access

  19. Microdata – methods to assess risks • cross-tabulation of variables; • comparing sample data with pop’n data to see if the unique characteristics in the sample are unique in the population; and • acquiring knowledge of other datasets & publicly available info. that could be used for list matching.

  20. Protecting microdata • 1st level of protection: remove direct identifiers • Common ways to protect microdata are: • confidentialising; and/or • restricting access to the file

  21. Confidentialisingmicrodata • Same principles as protecting aggregate data: • limit variables • introduce small amounts of random error (e.g. data swapping) • combine categories (e.g. age in 5 year ranges) • top/bottom code • suppress particular values/records that can’t otherwise be protected.

  22. Restricting access to microdata

  23. What affects the risk of identification? • motivation • level of detail • presence of rare characteristics • accuracy of the data • age of the data • coverage of the data (completeness) • presence of other information

  24. A note on terminology… • Confusion between de-identification and confidentialisation

  25. More information – www.nss.gov.au

More Related