1 / 11

Disclosure detection & control in research environments

This article explores the challenges of disclosure control in research environments, with a focus on ensuring consistency, transparency, and security with limited resources. It discusses classifying safe and unsafe outputs, determining safety of different output types, and provides examples of assessing risks and setting limits for disclosure. Contact Felix Ritchie for further questions.

pbishop
Download Presentation

Disclosure detection & control in research environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Disclosure detection & control in research environments Felix Ritchie

  2. Why are research environments special? • Little disclosure control on input • Few limits on processing • Unpredictable, complex outputs • an infinity of “special cases”  Manual review for disclosiveness required

  3. Problems of reviewing research outputs • Limited application of rules • How do we ensure • consistency? • transparency? • security? • How do we do this with few resources?

  4. Classifying the research zoo • Some outputs inherently “safe” • Some inherently “unsafe” • Concentrate on the unsafe • Focus training • Define limits • Discourage use

  5. Safe versus unsafe • Safe outputs • Will be released unless certain conditions arise • Unsafe outputs • Won’t be released unless demonstrated to be safe Examples: * = conditions for release apply

  6. Determining safety • Key is to understand whether the underlying functional form is safe or unsafe • Each output type assessed for risk of • Primary disclosure • Disclosure by differencing

  7. Example:linear aggregates of data are unsafe • Inherent disclosiveness: • Disclosure by differencing: • Differencing is feasible • each data point needs to be assessed for threshold/dominance limits => resource problem for large datasets

  8. Example:linear regression coefficients are safe • Let • But  can’t identify single data point  No risk of differencing • Exceptions • All right hand variables public and an excellent fit (easily tested, can generate automatic limits on prediction) • All observations on a single person/company • Must be a valid regression

  9. Example:cross-product/variance-covariance matrices • Cross product matrix M = (X’X) is unsafe • Frequencies/totals identified by interaction with constant • And for any other categorical variables • What about variance-covariance matrices? • Can’t create a table for X unless Z=X and W=I weighted covariance matrix is safe • V is unsafe – can be inverted to produce M • But in the more general case

  10. Example:Herfindahl indices • Composite index of industrial concentration • Safe as long as at least 3 firms in the industry? • No: • Quadratic term exacerbates dominance • If second-largest share is much smaller, H share of largest firm • Standard dominance rule of largest unit<45% share doesn’t prevent this • Current tests for safety not very satisfactory

  11. Questions? Felix Ritchie Microdata Analysis and User Support Office for National Statistics felix.ritchie@ons.gov.uk +44 1633 45 5846

More Related