1 / 14

Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS

Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS. Session 1 UNECE Work Session on Statistical Data Confidentiality 28-30 October 2013 Daniel Elazar daniel.elazar@abs.gov.au. Confidentiality Risks for Remote Server Outputs.

wayde
Download Presentation

Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1UNECE Work Session on Statistical Data Confidentiality28-30 October 2013 Daniel Elazar daniel.elazar@abs.gov.au

  2. Confidentiality Risks for Remote Server Outputs Known Types of Attack from the literature • Tabular attacks • Averaging • Differencing • Scope coverage • Sparsity • Regression attacks • Tabular attacks as above, plus • Leverage • High R2 – saturated or ideal model fit • Influence • Solving model equations

  3. TableBuilder Functionality

  4. TableBuilder Protections

  5. DataAnalyser Functionality Analysis Procedures /Specifications Exploratory Data Analysis Transformations / Derivations Output Formats Outputs Robust Linear Regression Binomial logistic Probit Multinomial Poisson Diagnostics Weighted Analysis • Logical derivations • Categorical/ Dummy variables • Category collapsing Expression Editor for categ. vars Drop variables / records Action List CSV Storage of intermediate datasets • R-squared • Pseudo R-squared • Coefficients • Standard errors Other Diagnostics • Summary statistics • (sums, counts) • Summary Tables • Graphics • (side-by-side box plots) Summary statistics (count) Graphics • Written in R • Full User Authentication • Audit System • Workflow Control • Data Repository Interface • Metadata Handler

  6. DataAnalyser Protections (additional to TB)

  7. So where’s the Risk in Regressions? x1,x2,…,xn x1 x1,x2,…,xk Saturated Model Sparse Model The Perfect Model y x c Leverage Attack

  8. Scope-Coverage (Differencing) Attack Case 1 Case 2 Age Age 95 96 95 96 15 15 A B A B Other Characteristics Other Characteristics Confidentialised outputs from requests A and B differ slightly  unit(s) (in red) exists in set B excluding A and are likely to be rare/unique Confidentialised outputs from requests A and B are exactly the same  There are no units in set B excluding A

  9. Perturbation of Unweighted Counts Unweighted Count (UWC) pcol_index prow_index pUWC = UWC + p p = pTable[ prow_index,pcol_index ] Perturbation Table

  10. Perturbation of Unweighted Counts p = pTable[ prow_index,pcol_index ] 32 bits 8 bits 8 bits 8 bits 8 bits C D is the bitwise XOR operator. + (mod 2), for ’th bit

  11. The Perturbation Algorithm:

  12. Perturbation of Weighted Continuous Values where direction mTable[i] magnitude noise

  13. Perturbation of Regression Estimates For generalised linear models, perturbation is applied to the score function using the following algorithm: Begin with an initial value Solve to obtain an unperturbed MLE Calculate the perturbed score function evaluated at applying the continuous perturbation to each summand in . This results in a vector of perturbation values, Solve using IRLS with initial value to obtain the perturbed estimate .

  14. Future Directions

More Related