120 likes | 258 Views
G-Confid: Turning the tables on disclosure risk. Joint UNECE/ Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013. Peter Wright. G-Confid: a cell suppression application. Use with any table size and any number of dimensions
E N D
G-Confid: Turning the tables on disclosure risk Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013 Peter Wright
G-Confid: a cell suppression application • Use with any table size and any number of dimensions (subject to hardware / memory limitations) • Available for SAS 9.2 and 9.3; SAS EG 4.3 and 5.1 Overview by component • PROC SENSITIVITY identifies sensitivecells • Highlights, inputs, strategies • Macro SUPPRESS creates a suppression pattern • Inputs, outputs, strategies • Macro AUDIT audits a suppression pattern
PROC SENSITIVITY identifies confidential cells Highlights: • Choice of sensitivity rule: p-percent, (n,k), arbitrary • Allows multipledecomposition where
Inputs for PROC SENSITIVITY • Definition of hierarchy(ies) for each table dimension • Microdata file • Classification variables (e.g., geography, industry) • Enterprise identifier • Enterprise value Tip: to reduce the sensitivity of a cell by the value of an enterprise, set the enterprise identifier = missing
Example of SAS code to run PROC SENSITIVITY proc sensitivity data=microfile outconstraint=consfileoutcell=cellfile outlargest=largestfile hierarchy="0 East West; 0 1 2 3;" srule=“pq .20" range=“East A B: West C D; 1 101 201 301: 2 102 202 302: 3 103 203 303;" minresp=5; idEnterpriseid; var Income; dimensionEastWest Industry; run;
Strategies using PROC SENSITIVITY • Use the MINRESP=r option to set the minimum number of respondents • Any cell with fewer than r respondents is assigned a sensitivity of max{1, S} where S is the sensitivity of the cell • Only positive (>0) values are counted as respondents • MINRESP rule is ignored for a cell with a value contributed by an anonymous enterprise • Note: we can use MINRESP without applying a sensitivity rule
Strategies using PROCSENSITIVITY (continued) • To reduce oversuppression, apply rules that make use of sampling weights Example: if the sampling weight wi>3, make the enterprise anonymous (set ID value=missing). G-Confid will use its contribution to reduce the sensitivity of the cell. Find more strategies in: Tambay and Fillion (Proceedings of the JSM 2013)
Macro SUPPRESS – complementary suppression • Uses the SAS/OR® LP solver • Input files: (i) cell sensitivities file, and (ii) linear constraints file • Syntax:%Suppress(InCell=, Constraint=, CFunction1=, CFunction2=, CVar1=, CVar2=, OutCell=, ByVars=, OutComplement=, ScaleCost=); • Output file has final status (Suppress, Publish) and the net variation (largest amount the cell was “moved”)
Strategies using the macro SUPPRESS • Choice of cost functions (functions of cell total) • Can run the LP process twice to reduce the number of suppressions (e.g., SIZE or DIGITS, then INFORMATION) • Can favour publishing certain cells by defining higher cost values (by default, cost=tot) SIZE (=tot) DIGITS (=log[tot+1]) CONSTANT (=1) INFORMATION (=log[tot+1]/[tot+1])
Macro AUDIT – validates a suppression pattern • Calculates minimum and maximum values for each suppressed cell using LP solver • Provides results for each cell (protection achieved, not achieved, or exact disclosure) • Coming soon: pre-set narrower starting intervals than the default values (0.5tot and 1.5tot) using the Shuttle algorithm (Buzzigoli and Giusti (2006)) Using the Shuttle algorithm to pre-set the starting intervals ↓ run time
Conclusion • PROC SENSITIVITY • Use pre-defined or customized sensitivity rule • Can do multiple decomposition • MINRESP function • Can apply weighting strategies • Macro SUPPRESS • Can favour cells to publish (or suppress) • Macro AUDIT Coming soon: additive controlled rounding
For more information, Pour plus d’information, please contact: veuillezcontacter : Peter Wright Peter.Wright@statcan.gc.ca