180 likes | 323 Views
Privacy-preserving Anonymization of Set Value Data. Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore www.comp.nus.edu.sg/~kalnis. Motivation. Helen. Attacker can see up to m items Any m items
E N D
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore www.comp.nus.edu.sg/~kalnis
Motivation Helen • Attacker can see up to m items • Any m items • No distinction between sensitive and non-sensitive items 0% Milk Beer Pregnancy test
Motivation (cont.) Helen: Beer, 0%Milk, Pregnancy test John: Cola, Cheese Tom: 2% Milk, Coffee …. Mary: Wine, Beer, Full-fat Milk Database Attacker Find all transactions that contain Beer & 0% Milk Published t1: Beer, Milk, Pregnancy test t2: Cola, Cheese t3: Milk, Coffee …. tn: Wine, Beer, Milk t1: Beer, 0%Milk, Pregnancy test t2: Cola, Cheese t3: 2% Milk, Coffee …. tn: Wine, Beer, Full-fat Milk
km-anonymity Set of items Transaction Query terms Database km-anonymity:
Related Work: K-Anonymity [Swe02] NOT suitable for high-dimensionality Quasi-identifier (a) Microdata • 2-anonymous microdata [Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.
Related Work: L-diversity in Transactions Requires knowledge of (non)-sensitive attributes [GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008
Our Approach: Employs Generalization Information loss Generalization Hierarchy k=2 m=2
Count Tree 1 1 1 1 1 1 1 1 3 2 2 2
Optimal Algorithm Q: Q: Q:
“Direct” Anonymization • Solves each “problem” independently COUNT({a1,a2})=1
“Apriori-based” Anonymization • Construct the count-tree incrementally • Prune unnecessary branches
Small Datasets (2-15K, BMS-WebView2) • |I|=40..60, k=100, m=3
Small Datasets (BMS-WebView2) • |D|=10K, k=100, m=1..4
Apriori Anonymization for Large Datasets 500sec 100sec 10sec • k=5 • m=3
Points to Remember • Anonymization of Transactional Data • Attacker knows m items • Any m items can be the quasi-identifier • Global recoding method • Optimal solution: too slow • Apriori Anonymization: fast and low information loss • On-going work • Local recoding (sort by Gray order and partition) • Transactional data in streaming environments
Bibliography on LBS Privacy http://anonym.comp.nus.edu.sg ?