1 / 17

Anonymising quantitative data

Anonymising quantitative data. Dr Sharon Bolton UK Data Service UK Data Archive, University of Essex. Anonymising Research Data workshop Dublin, 22 June 2016. The UK Data Service. Single point of access to wide range of social science data: ukdataservice.ac.uk

melba
Download Presentation

Anonymising quantitative data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anonymising quantitative data Dr Sharon Bolton UK Data Service UK Data Archive, University of Essex Anonymising Research Data workshop Dublin, 22 June 2016

  2. The UK Data Service • Single point of access to wide range of social science data: ukdataservice.ac.uk • Funded by the ESRC to serve the academic community: training and guidance; UK Data Archive established 1967 • Used by academic researchers and students; government analysts; charities; business; research centres; think tanks • Survey microdata; cohort studies; international macrodata; census data; qualitative/mixed methods data • Support and guide data creators, including disclosure review (anonymisation) and preparation for archiving

  3. Protecting confidentiality: the ‘5 Safes’ Five guiding principles: • Safe people- educate researchers to use data safely • Safe projects- research projects for ‘public good’ • Safe settings - SecureLab system for sensitive data • Safe outputs - SecureLab projects outputs screened • Safe data - treat the data to protect respondent confidentiality • For this session, we will concentrate (mostly) on Safe data

  4. Data collection: planning • Explain to respondents what archiving entails and gain agreement for data sharing – informed consent • Think about disclosure risks before starting – what kind of information do you need to collect? • Direct identifiers include: names; addresses; telephone numbers; email addresses; photos; (perhaps) IP addresses; do you really need them? • Unless explicit consent obtained for sharing, direct identifiers should always be removed from data

  5. Anonymising data: indirect identifiers Indirect identifiers include: • Sensitive information: health information/medical conditions; crime victimisation/offending; drug/alcohol use etc. • ‘Less sensitive’ information: age/birth date; educational characteristics; employment details; religious affiliation; household size; geographic area • Look at demographics in combination (e.g.demographics + geographies) • Text/string variables – too detailed?

  6. Anonymising indirect identifiers • Aggregate categories to reduce precision • Band ages, incomes, expenditure, etc. to disguise outliers • Use standard coding frames – e.g. SOC2010 • Generalise meaning of detailed text • Document the changes you make • Talk to other researchers, archives, data services Published guides: • UCD Research Data Management Guide http://libguides.ucd.ie/data/ethics • ONS Disclosure control guidance for microdata produced from social surveys http://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/disclosurecontrol/policyforsocialsurveymicrodata

  7. Anonymising data: new developments and tools Statistical Disclosure Control (SDC) software is available: • mu-Argus • standalone software package recommended by Eurostat for government statisticians • software and manual: http://neon.vb.cbs.nl/casc/mu.htm • R tool - SDCMicro (GUI) • Software, manual: http://www.inside-r.org/packages/cran/sdcMicro/docs/sdcMicro • new documentation being developed by UK Data Service, working with R developers

  8. Quiz 1: disclosive text in job title

  9. Quiz 1: jobs coded with SOC2010

  10. Quiz 2: detailed religion categories

  11. Quiz 2: religion categories aggregated

  12. Quiz 3: agein years

  13. Quiz 3: banded age

  14. Access control • Don’t over anonymise - find balance between protecting respondents’ confidentiality and maintaining research usability of data • Can’t fully anonymise data without removing all the useful detail? Go back to the 5 Safes – think about access control: Safe people, Safe settings, Safe outputs

  15. Access control • At UK Data Service, data available under 3 access levels: • OPEN – open public access • SAFEGUARDED – downloadable, but use is traceable • Registered users only (agree not to try to identify any individual respondents) • Special agreements/licence: permission-only access; approved projects – usage agreed in advance • CONTROLLED – accredited users take a further training course • Access via on-site safe setting or virtual secure environment (SecureLab) • Outputs disclosure-checked before publication

  16. Anonymising quantitative data: summary • Informed consent • Think about level of detail needed before data collection • Remove direct identifiers • Check and treat indirect identifiers to reduce disclosure risk • Document your changes • Balance anonymisation with access control to preserve data usability

  17. Questions? Guidance on anonymisation: • UCD: http://libguides.ucd.ie/data/ethics • UKDS: www.data-archive.ac.uk/create-manage/consent-ethics/anonymisation • Managing and Sharing Research Data book https://uk.sagepub.com/en-gb/eur/managing-and-sharing-research-data/book240297

More Related