1 / 21

Statistical Disclosure Control for Tabular Data: Enhancing Data Confidentiality

Learn about safety rules in statistical disclosure control for tabular data focusing on SDC for magnitude tables, existing safety rules, generalised p-rule, rational estimates, prior distribution, U-estimates, and more.

gillis
Download Presentation

Statistical Disclosure Control for Tabular Data: Enhancing Data Confidentiality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UN/ECE Work Session On Statistical Data Confidentiality (Geneva, 9-11 November 2005)WP30: Safety rules in statistical disclosure control for tabular data Giovanni Merola Winton Capital Management Ltd g.merola@wintoncapital.com Partially written while at ISTAT and partially supported by EU project CASC.

  2. Plan of the Talk • SDC for Magnitude tables; • Existing safety rules; • Generalised p-rule; • Rational estimates; • Prior distribution; • U-estimates; • Comparison on real SBS data; • MU-rules; • Concluding remarks.

  3. 1. SDC for Magnitude Tables Tables showing the sums of non-negative contributions in each cell. Example: Contributions in non-increasing order Total 600 (Old Males) Total T is published n is number contrib.n

  4. 1. SDC for Magnitude Tables cont.d SDC policy: • If the categories are confidential, (likely) identification of respondents is disclosure; • else only the contributions of (likely) identifiable respondents cannot be disclosed (too precisely); • same rule for all cells, else microdata protection.

  5. 2. Existing Safety Rules • Rare respondents are identifiable • threshold rule: n > m. • Respondents with large contrib. are identifiable • Dominance: (z1+···+zm)/T k. • Largest contributor is identifiable, hence second largest must not estimate z1 closely • p-rule: [(T-z2) -z1]/z1> p.

  6. 3. Generalised p-rule Includes the existence of groups of respondents • Group with largest sum identifiable; • group with second largest sum must not estimate largest sum too closely; Total is T R2,2 t2

  7. t1=z1 and R1,1=z2 p-rule 3. Generalised p-rule cont.d Same estimate as p-rule: maximum possible value ^tm=T-Rm,l • Gen. p-rule ((T-Rm,l)-tm)/tm > p

  8. 3. Generalised p-rule cont.d • If zero contributions are known (external intruder): Dominance rule with k=1/(1+p) • If no groups: simple p-rule; • If intruding group formed of (m-1) respondents: threshold rule n>m protects against exact estimation (p=0). Merola, G. M., 2003a. Generalized risk measures for tabular data. Proceedings of the 54th Session of the International Statistical Institute.

  9. 4. Rational Estimates • An intruder can compute a lower and an upper bound for the value of tm: • For example, if z2=40 and T=100: 40=z2  z1 T- z2=60; • the bounds are different for different prior knowledge of the intruder.

  10. 4. Rational Estimates cont.d • tm can be estimated by minimising the Mean Square Error for some distribution F(tm) : • for a well known property MSE is minimised by the mean

  11. 5. Prior Distribution: Uniform • The ignorance about the distribution of tm can be modelled with a Uniform distribution: tm~U(tm-, tm+) • in this case the mean is simply: • Note: same estimate for any symmetric F.

  12. 5. Prior Distribution: maximising The Generalised p-rule can be derived by assuming a prior concentrated on the maximum value • We refer to the Gen p-rule as M-rule, and to the that derived using the Uniform as U-rule.

  13. 6. U-estimates Different prior knowledge of the intruder • knows T but not n: • knows T and n, • knows T and L contributions, • knows T, L contributions and n,either as above or • * for m=L=1 uniform p-rule is same as uniform dominance (Dominance); (Gen. p-rule*) Merola, G., 2003b. Safety rules in statistical disclosure control for tabular data. Contributi Istat 1, istituto Nazionale di Statistica, Roma.

  14. 6. U-estimates cont.d Example C=(970,376,274,253,203,169,161,121,86,62,21,10), T=2706

  15. 7. Comparison on real SBS data We applied different rules to Italian SBS data, turnover by Region and SIC for the years ’94 and ‘97. We considered the SIC with 2 and 3 digits.

  16. 7. Comparison on real SBS data cont.d Mean relative error forz1

  17. 7. Comparison on real SBS data cont.d Mean relative error fort2

  18. 8. U-rules • The values for are intervals: • Knowing only T (Dominance) • Knowing T and L contributions (gen p-rule)

  19. 9. MU-rules • assuming both estimating approaches we obtain subadditive rules, analogous to p-rule but with stricter bounds

  20. 9. MU-rules cont.d • Safety rule when only T known (Dominance) • Safety rule when T and L contributions known(gen p-rule)

  21. 10. Conclusions • The assumptions for the existing rules are unrealistic; • using a simple noninformative distribution much smaller relative error of estimation; • the corresponding rules are not subadditive; • joining assumptions leads to stricter rules; • identifiability of all largest respondents requires these rules; • different prior can be used.

More Related