270 likes | 450 Views
The secret lives of us: d ata confidentiality. October 2011. Linda Fardell Cross Portfolio Data Integration Secretariat. What is it & why should you care?. It’s about obligations – legal/ethical Aim – protect identity and release useful data It’s more than removing name & address
E N D
The secret lives of us: data confidentiality October 2011 Linda FardellCross Portfolio Data Integration Secretariat
What is it & why should you care? • It’s about obligations – legal/ethical • Aim – protect identity and release useful data • It’s more than removing name & address • Trust of providers is essential to get good stats
Information is power • Banker in Maryland obtained a list of patients with cancer • compared with list of clients with outstanding loans • called in the loans of clients with cancer. Source: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy (Statist. Surv. Volume 5 (2011), 1-29.
Legislative obligations • Privacy Act • Specific legislation governing collection & use of information e.g. • Social Security (Administration) Act 1999 • Taxation Administration Act 1953
Other obligations • Principles based obligations e.g. High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes
How agencies meet these obligations • Implement procedures to address all aspects of data protection • To ensure that identifiable information: • is not released publicly; • is available on a ‘need to know’ basis; • can’t be derived from disseminated data; and • is maintained and accessed securely.
Managing identification risk Understand your obligations Establish policies and procedures De-identify the data Assess potential identification risks Test and evaluate to mitigate risks Manage the risks of identification - confidentialise Provide safe access to data
Access to other information • Keep track of all information released from the dataset.
When should a cell be confidentialised? • Common confidentiality rules: • frequency (threshold) rule • cell dominance (cell concentration) rule • Keep specific confidentiality procedures secret(e.g. the particular value chosen when applying the threshold rule)
Two general methods • Data reduction • Data modification (perturbation)
Example: frequency rule - 5 Before
Example: cont. After
E.g. 2 – the cell dominance (n,k) rule • Cell unsafe if combined contributions of the ‘n’ largest members of the cell represent more than ‘k’% of the total value of the cell • n & k values are set by data custodian • Example: (2, 75) rule • A & B contribute 81% of total profit, so profit needs protecting
Data modification methods Before roundingRR3
Data modification methods After rounding RR3
Microdata • Valuable resource • 2 key types of disclosure risk: • spontaneous recognition • deliberate (malicious) attempt
Microdata – managing risks • confidentialising • deterrents • restricting access • educating data users about their obligations • safe environment for access
Microdata – methods to assess risks • cross-tabulation of variables; • comparing sample data with pop’n data to see if the unique characteristics in the sample are unique in the population; and • acquiring knowledge of other datasets & publicly available info. that could be used for list matching.
Protecting microdata • 1st level of protection: remove direct identifiers • Common ways to protect microdata are: • confidentialising; and/or • restricting access to the file
Confidentialisingmicrodata • Same principles as protecting aggregate data: • limit variables • introduce small amounts of random error (e.g. data swapping) • combine categories (e.g. age in 5 year ranges) • top/bottom code • suppress particular values/records that can’t otherwise be protected.
What affects the risk of identification? • motivation • level of detail • presence of rare characteristics • accuracy of the data • age of the data • coverage of the data (completeness) • presence of other information
A note on terminology… • Confusion between de-identification and confidentialisation