180 likes | 460 Views
Anonymising quantitative data. Dr Sharon Bolton UK Data Service UK Data Archive, University of Essex. Anonymising Research Data workshop Dublin, 22 June 2016. The UK Data Service. Single point of access to wide range of social science data: ukdataservice.ac.uk
E N D
Anonymising quantitative data Dr Sharon Bolton UK Data Service UK Data Archive, University of Essex Anonymising Research Data workshop Dublin, 22 June 2016
The UK Data Service • Single point of access to wide range of social science data: ukdataservice.ac.uk • Funded by the ESRC to serve the academic community: training and guidance; UK Data Archive established 1967 • Used by academic researchers and students; government analysts; charities; business; research centres; think tanks • Survey microdata; cohort studies; international macrodata; census data; qualitative/mixed methods data • Support and guide data creators, including disclosure review (anonymisation) and preparation for archiving
Protecting confidentiality: the ‘5 Safes’ Five guiding principles: • Safe people- educate researchers to use data safely • Safe projects- research projects for ‘public good’ • Safe settings - SecureLab system for sensitive data • Safe outputs - SecureLab projects outputs screened • Safe data - treat the data to protect respondent confidentiality • For this session, we will concentrate (mostly) on Safe data
Data collection: planning • Explain to respondents what archiving entails and gain agreement for data sharing – informed consent • Think about disclosure risks before starting – what kind of information do you need to collect? • Direct identifiers include: names; addresses; telephone numbers; email addresses; photos; (perhaps) IP addresses; do you really need them? • Unless explicit consent obtained for sharing, direct identifiers should always be removed from data
Anonymising data: indirect identifiers Indirect identifiers include: • Sensitive information: health information/medical conditions; crime victimisation/offending; drug/alcohol use etc. • ‘Less sensitive’ information: age/birth date; educational characteristics; employment details; religious affiliation; household size; geographic area • Look at demographics in combination (e.g.demographics + geographies) • Text/string variables – too detailed?
Anonymising indirect identifiers • Aggregate categories to reduce precision • Band ages, incomes, expenditure, etc. to disguise outliers • Use standard coding frames – e.g. SOC2010 • Generalise meaning of detailed text • Document the changes you make • Talk to other researchers, archives, data services Published guides: • UCD Research Data Management Guide http://libguides.ucd.ie/data/ethics • ONS Disclosure control guidance for microdata produced from social surveys http://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/disclosurecontrol/policyforsocialsurveymicrodata
Anonymising data: new developments and tools Statistical Disclosure Control (SDC) software is available: • mu-Argus • standalone software package recommended by Eurostat for government statisticians • software and manual: http://neon.vb.cbs.nl/casc/mu.htm • R tool - SDCMicro (GUI) • Software, manual: http://www.inside-r.org/packages/cran/sdcMicro/docs/sdcMicro • new documentation being developed by UK Data Service, working with R developers
Access control • Don’t over anonymise - find balance between protecting respondents’ confidentiality and maintaining research usability of data • Can’t fully anonymise data without removing all the useful detail? Go back to the 5 Safes – think about access control: Safe people, Safe settings, Safe outputs
Access control • At UK Data Service, data available under 3 access levels: • OPEN – open public access • SAFEGUARDED – downloadable, but use is traceable • Registered users only (agree not to try to identify any individual respondents) • Special agreements/licence: permission-only access; approved projects – usage agreed in advance • CONTROLLED – accredited users take a further training course • Access via on-site safe setting or virtual secure environment (SecureLab) • Outputs disclosure-checked before publication
Anonymising quantitative data: summary • Informed consent • Think about level of detail needed before data collection • Remove direct identifiers • Check and treat indirect identifiers to reduce disclosure risk • Document your changes • Balance anonymisation with access control to preserve data usability
Questions? Guidance on anonymisation: • UCD: http://libguides.ucd.ie/data/ethics • UKDS: www.data-archive.ac.uk/create-manage/consent-ethics/anonymisation • Managing and Sharing Research Data book https://uk.sagepub.com/en-gb/eur/managing-and-sharing-research-data/book240297