Anonymising quantitative data

Anonymising quantitative data Dr Sharon Bolton UK Data Service UK Data Archive, University of Essex Anonymising Research Data workshop Dublin, 22 June 2016

The UK Data Service • Single point of access to wide range of social science data: ukdataservice.ac.uk • Funded by the ESRC to serve the academic community: training and guidance; UK Data Archive established 1967 • Used by academic researchers and students; government analysts; charities; business; research centres; think tanks • Survey microdata; cohort studies; international macrodata; census data; qualitative/mixed methods data • Support and guide data creators, including disclosure review (anonymisation) and preparation for archiving

Protecting confidentiality: the ‘5 Safes’ Five guiding principles: • Safe people- educate researchers to use data safely • Safe projects- research projects for ‘public good’ • Safe settings - SecureLab system for sensitive data • Safe outputs - SecureLab projects outputs screened • Safe data - treat the data to protect respondent confidentiality • For this session, we will concentrate (mostly) on Safe data

Data collection: planning • Explain to respondents what archiving entails and gain agreement for data sharing – informed consent • Think about disclosure risks before starting – what kind of information do you need to collect? • Direct identifiers include: names; addresses; telephone numbers; email addresses; photos; (perhaps) IP addresses; do you really need them? • Unless explicit consent obtained for sharing, direct identifiers should always be removed from data

Anonymising data: indirect identifiers Indirect identifiers include: • Sensitive information: health information/medical conditions; crime victimisation/offending; drug/alcohol use etc. • ‘Less sensitive’ information: age/birth date; educational characteristics; employment details; religious affiliation; household size; geographic area • Look at demographics in combination (e.g.demographics + geographies) • Text/string variables – too detailed?

Anonymising indirect identifiers • Aggregate categories to reduce precision • Band ages, incomes, expenditure, etc. to disguise outliers • Use standard coding frames – e.g. SOC2010 • Generalise meaning of detailed text • Document the changes you make • Talk to other researchers, archives, data services Published guides: • UCD Research Data Management Guide http://libguides.ucd.ie/data/ethics • ONS Disclosure control guidance for microdata produced from social surveys http://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/disclosurecontrol/policyforsocialsurveymicrodata

Anonymising data: new developments and tools Statistical Disclosure Control (SDC) software is available: • mu-Argus • standalone software package recommended by Eurostat for government statisticians • software and manual: http://neon.vb.cbs.nl/casc/mu.htm • R tool - SDCMicro (GUI) • Software, manual: http://www.inside-r.org/packages/cran/sdcMicro/docs/sdcMicro • new documentation being developed by UK Data Service, working with R developers

Quiz 1: disclosive text in job title

Quiz 1: jobs coded with SOC2010

Quiz 2: detailed religion categories

Quiz 2: religion categories aggregated

Quiz 3: agein years

Quiz 3: banded age

Access control • Don’t over anonymise - find balance between protecting respondents’ confidentiality and maintaining research usability of data • Can’t fully anonymise data without removing all the useful detail? Go back to the 5 Safes – think about access control: Safe people, Safe settings, Safe outputs

Access control • At UK Data Service, data available under 3 access levels: • OPEN – open public access • SAFEGUARDED – downloadable, but use is traceable • Registered users only (agree not to try to identify any individual respondents) • Special agreements/licence: permission-only access; approved projects – usage agreed in advance • CONTROLLED – accredited users take a further training course • Access via on-site safe setting or virtual secure environment (SecureLab) • Outputs disclosure-checked before publication

Anonymising quantitative data: summary • Informed consent • Think about level of detail needed before data collection • Remove direct identifiers • Check and treat indirect identifiers to reduce disclosure risk • Document your changes • Balance anonymisation with access control to preserve data usability

Questions? Guidance on anonymisation: • UCD: http://libguides.ucd.ie/data/ethics • UKDS: www.data-archive.ac.uk/create-manage/consent-ethics/anonymisation • Managing and Sharing Research Data book https://uk.sagepub.com/en-gb/eur/managing-and-sharing-research-data/book240297

Anonymising quantitative data

Anonymising quantitative data

Presentation Transcript

Analyzing quantitative data

Summarizing Quantitative Data

Quantitative Data Analysis

Quantitative Data Analysis

Quantitative Data Analysis

Displaying Quantitative Data

Comparing Quantitative Data

Quantitative data analysis

Describing Quantitative Data

Displaying Quantitative Data

Describing Quantitative Data

Quantitative Data (Graphical)

Quantitative Data Analysis

Quantitative Data Presentation

Quantitative Data Analysis

Analyzing Quantitative Data

QUANTITATIVE DATA ANALYSIS

Quantitative Data

Quantitative Data Analysis

Quantitative Data Analysis

Quantitative Coding | Quantitative Data Analysis Services | Coding Quantitative Data SPSS - Statswork