350 likes | 447 Views
Data Security. Ram Gopal University of Connecticut. ( Robert Garfinkel , Paulo Goes, Manuel Nunez, Daniel Rice, Steven Thompson ). Research Objectives. Overview Security of Microdata with Individual Identifiers.
E N D
Data Security Ram Gopal University of Connecticut (Robert Garfinkel, Paulo Goes, Manuel Nunez, Daniel Rice, Steven Thompson)
Research Objectives • Overview • Security of Microdata with Individual Identifiers Maximize the utility of information provided to users while maintaining the security of confidential information.
Confidentiality-Related Identity-Related Confidential • Security Considerations • Disclosure of Confidential Information • Identity Disclosure
Protection of Confidential Information • Query Restriction • Perturbation • Camouflage • Hybrid Methods
Query Restriction • Answer queries from users if confidentiality is not violated
Query Answering Process User Query Compute Query Answer User Log Confidentiality Check Answer Query Compromise? Update No Yes Reject Query
Query1: x1+x2+x3+x4 (answer=350) Min/Max xi x1+x2+x3+x4 = 350 20 ≤ xi ≤ 150 Query2: x1+x3 (answer=151) Min/Max xi x1+x2+x3+x4 = 350 x1+x3 = 151 20 ≤ xi ≤ 150 Min/Max xi x1+x2+x3+x4 = 350 Query3 x1+x3 = 151 20 ≤ xi ≤ 150 Illustration Query1: Query2: Query3: x1+x2 (answer=180) x1+x2 = 180
Query: min{x1,x2,x3,x4} (answer=49) xi≥ 49 {x1,x2,x3,x4} = 49 Query Restriction • Binary Data with COUNT queries • Other Query Types (MIN, VAR) Maximal Subset Selection is NP-Hard
87.89 52.07 134.60 101.95 71.15 24.76 46.81 94.79 74.24 49.57 Perturbation 82.32 -19.68 • Data Swapping/Shuffling • Binning
Camouflage • Interval Answers • Answer Guarantee • Interval Protection • Storage Efficiency • Computational Efficiency • “Good” Query Answers Record 2 Record 1
Hybrid Techniques Query Restriction and Perturbation
Min/Max xi x1+x2+x3+x4 = 350 x1+x3 = 151 20 ≤ xi ≤ 150 Perturbation Min/Max xi x1+x2+x3+x4 = 350 x1+x3 = 151 20 ≤ xi ≤ 150 x3 = 49 Insider Threats What if a user knows that a3 = 49? Query Restriction Camouflage
Emerging Application Areas • Data Mining (c.f. Sarkar, UT-Dallas) • ‘Inverse’ Problem (c.f. Sarkar, UT-Dallas) • Trading Privacy • Privacy in Distributed Settings (J. Canny, Berkeley)
Identity Disclosure Strip Explicit Identifiers
Security of Microdata with Individual Identifiers Along with Can we provide microdata in the following form? As much as possible
Channel View Input Channel 1 (N,N,N) 2 Input Channel 2 (L,H,L) 1 Input Channel 3 (N,N,H) 2 Input Channel 4 (N,H,N) 3 Input Channel 5 (L,N,V) 1 Input Channel 6 (H,H,L) 2
Option 1: Full Revelation Input Channel 1 (N,N,N) 2 Input Channel 2 (L,H,L) 1 Input Channel 3 (N,N,H) 2 Input Channel 4 (N,H,N) 3 Input Channel 5 (L,N,V) 1 Input Channel 6 (H,H,L) 2
Option 2: Minimal Revelation Input Channel 1 (N,N,N) 2 Input Channel 2 (L,H,L) 1 Output Channel 1 (ALL) Input Channel 3 (N,N,H) 2 Input Channel 4 (N,H,N) 3 11 Input Channel 5 (L,N,V) 1 Input Channel 6 (H,H,L) 2
Option 3: Partial Revelation Input Channel 1 (N,N,N) 2 Output Channel 1 Input Channel 2 (L,H,L) 1 Input Channel 3 (N,N,H) 6 2 Input Channel 4 (N,H,N) 3 Output Channel 2 Input Channel 5 (L,N,V) 1 5 Input Channel 6 (H,H,L) 2
Assign All x11+x12 = 75 x11 Output Channel (2 and 3) Input Channel 1 75 x12 Input Channel 2 ALL Safety Net Input Channel 3 Output Channel Input Channel ≥ 1 safe channel Input Channel (L,L,L) (V,V,V) Non-Intuitive Formulation: Constraints Truthful Random Assignment Input Channel 4
Formulation: Objective Minimize Information Loss 45 Input Channel 1 x11 Output Channel 1 2×(x11+x21) x21 89 (45+89)×(x11+x21) Input Channel 2 x22 Output Channel 2 75 x32 Input Channel 3
Output Channel 1 8,755 Output Channel 2 7,446 Output Channel 3 3,002 Output Channel 4 727 Output Channel 5 6,521 Output Channel 6 9,442 Illustration 2,098 Input Channel 1 1,994 Input Channel 2 4,055 Input Channel 3 3,002 Input Channel 4 4,921 Input Channel 5 9,875 Input Channel 6 9,442 Input Channel 7 506 Input Channel 8
Simulation Analysis • 1 million records • 200 input channels/2048 output channels
Concluding Remarks • Useful data can be released even when the risk is high • Consider Insider Threats • Integrate with Perturbation/Camouflage