1 / 16

Confidentiality protection of large frequency data cubes

Confidentiality protection of large frequency data cubes. UNECE Workshop on Statistical Confidentiality Ottawa 28-30 October 2013 Johan Heldal and Svetlana Badina Statistics Norway. Eurostat Census Hypercubes.

adonai
Download Presentation

Confidentiality protection of large frequency data cubes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa 28-30 October 2013 Johan Heldal and Svetlana Badina Statistics Norway

  2. Eurostat Census Hypercubes • 60 Census 2011 frequency count hypercubes that all 32 EU+EEA countries must submit in 2014. • Four to nine variables (breakdowns) in each cube. • Each country is responsible for its own disclosure control method according to national legislation. • Norway is the only country that wishes to use small count (1 and 2) rounding as the preferred disclosure control method. • This presentation will show how. • Hypercube 06 will be used for illustration.

  3. The problem

  4. Idea • We want to create uncertainties about whether zeroes are real zeroes. • Creating more zeroes from small counts (1 and 2) by rounding to 0 or 3 (unbiasedly) • The rounding must be carried out to minimize perturbation on given aggregate counts. • Counts of 1 and 2 are not necessarily considered problematic by themselves but will be removed by rounding.

  5. Hypercube 06

  6. Principal Marginal Distributions

  7. Reduce the hypercube STEP 1: Identifying small counts • Reduce hypercube A by selecting a subset B consisting of • All interior cells in A with counts 1 or 2 or • all interior cells in A contributing to 1 or 2 in the PMDs of A. • Calculate C = A – B STEP 2:Rounding. • nB= total value of B • Round [nB/3] interior counts in Bto 3, the rest to 0. B*. • IF the solution B* is good enough, STOP. ELSE, continue search for a better B*. STEP 3: Calculate A* = C + B*, the rounded cube.

  8. Simple properties • A* - A = B* - B = C • A*is additive • |nA – nA* | = |nA – 3[nA/3]| ≤ 1 • All Primary Marginal Distributions will be consistently rounded.

  9. The Norwegian HC 06

  10. Rounding method used • Let nB = total count of B, e.g. nB = 3 199 • From the non-zero cells in B, select (WOR) [nB/3] (=1066) cells to be rounded to 3. • Probabilities: P(2  3) = 2·P(1  3) • Selection may be stratified. • Calculate distance m=maxcM|bc* – bc | across a control set M of marginal cells of B. • The solution with the smallest value m is selected.

  11. Test experiment • Control set M : All one- and two-way marginal counts generated from the eight variables spanning HC 06. (1985 cells.) • 10 000 runs are done. • For full HC 06 and for the PMDs only • With stratified and unstratified sampling.

  12. Discussion • The method is not yet fully approved for the Census HCs. • Is the method sufficient to prevent any kind of disclosure? • The reduction of the problem (A  B) absolutely required to make the method work. • Advantage: • Can produce consistent results with acceptable (?) aggregate deviations for a number of linked cubes of some size. • Problems: • With random search the result is subject to chance. • Diminishing return from increasing the number of iterations. • We need to find better and more stable search engines. • Generalization to rounding bases of more than 3 will increase the deviations in aggregates.

  13. Further work • Try better sampling procedures (Balanced sampling?) • Try Mixed Integer Linear Programming software. • Extend the experiment to round more hypercubes jointly. • An idea: Merge the reduced rounded cells back into microdata: • A method for perturbing some variables in relation to others. • How many variables must be perturbed this way to make all hypercubes safe? • Creates a micro data set that produces the rounded tables directly.

  14. Thank you very much foryour attention

More Related