1 / 13

WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS. Christine Bycroft, Katherine Merrett Office for National Statistics, UK. Outline. What is PRAM Why we needed to adapt the PRAM method Adapted PRAM Methodology Disclosure risks Effect on Data Quality Conclusions.

avari
Download Presentation

WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP 15Experience of using a Post Randomisation Method (PRAM) at ONS Christine Bycroft, Katherine Merrett Office for National Statistics, UK

  2. Outline • What is PRAM • Why we needed to adapt the PRAM method • Adapted PRAM Methodology • Disclosure risks • Effect on Data Quality • Conclusions

  3. What is PRAM • PRAM is a disclosure control technique for categorical data in microdata files. • The values of a categorical variable are changed according to a prescribed probability. • Each new perturbed value may or may not be different from the original value. • For example, a person who is classified as a widow may be re-classified as single.

  4. Probability mechanism for PRAM • The probability mechanism is described by an invertible transition matrix P • One P matrix for each variable • Let P=( pij ) be an LxL matrix for a variable having L categories. The entries of the matrix are the conditional probabilities. • pii is the probability of no change

  5. Risk and data utility for PRAM Disclosure risk • PRAM offers protection by inflow and outflow: • inflow from safe combinations of values to risky combinations • outflow from risky combinations to safe combinations. Data Utility • the Invariant PRAM method preserves univariate frequencies in expectation • No control over joint distributions- may create edit failures, e.g. 14 year old doctor - or highly unusual combinations, e.g. 17 year old widow

  6. Why adapt the PRAM method? • Applied to the 2001 Individual Sample of Anonymised Records (SARs) drawn from the Census.(know population uniques from Census records) • Used recoding as first method to reduce risk • Do not apply PRAM to the whole file • Perturb only remaining high risk records (small proportion of all records) • Wish to preserve exact univariate frequencies, not just expected values • Wish to control joint distributions to minimise edit failures and unusual combinations

  7. Adapted PRAM Methodology • Perturbing only those records which are high risk • For the transition matrix, P we want to: • Maximise the probability of changing values • Preserve freqencies (ie P is invariant) • Create perturbed records that are feasible and will not result in highly unusual combinations • Define a linear programming problem

  8. Adapted PRAM Methodology • The LP routine minimised the objective function, subject to constraints. The objective function is • We have set up a Weight Matrix to avoid extreme transitions. • Rather than having extreme changes that might create highly unusual individuals or invalid combinations, we prefer to keep the values as they are.

  9. Implementation • PRAM variables sequentially - greatest contribution to risk first • Define weight matrix for each variable • LP solved in SAS, to get P transition matrix • PRAM within control variables (eg PRAM age within marital status categories) • Implementation of pij probabilities preserves exact frequencies • Check for edit failures, and correct • Perturbed records are flagged as being imputed (whether changed or not)

  10. Results: Disclosure risks • Our aim was to only protect against attempts at exact matching. Assumed that perturbing the value of one variable in a high risk record provides sufficient protection • Protection by high outflow, but low inflowResults showed high proportions changed, except for last variables in sequence • Acceptable, since these variables had the lowest overall contribution to disclosure risk, and only a small number of records were affected

  11. Results: Data Quality Preservation of the univariate frequencies - excellent results Preservation of the multivariate frequencies • very few records failed the edit checks • compare tables before and after PRAM: • Each cell: ratio of the relative error due to PRAM and relative sampling error

  12. Effect on Data Quality • Results from 15 tables (nearly 3,000 cells) • The effect of perturbation relative to sample error decreases as the cell size increases. Thus the damage done by PRAM is greater for cells with low frequencies. Table 1: Percentage of Cells across all tables with a ratio of the error due to PRAM and the sampling error of greater than 1 and 2

  13. Conclusions • As used in this context on targeted records, PRAM is an efficient method of data perturbation, which is well controllable. • Applying PRAM to a small proportion of the file has allowed us to strike a good balance between recoding and minimising the damage from perturbation.

More Related