160 likes | 676 Views
Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti. P.h.D . Candidate: Anna Monreale. PRIVACY AND security Issues IN Data Mining. University of Pisa Department of Computer Science. Privacy-Preserving Data Mining. New privacy-preserving data mining techniques:
E N D
Supervisors Prof.DinoPedreschi Dott.ssa FoscaGiannotti P.h.D. Candidate: Anna Monreale PRIVACY AND security Issues IN Data Mining University of Pisa Department of Computer Science
Privacy-Preserving Data Mining • New privacy-preserving data mining techniques: • For individual privacy: Personal data are private • For corporate privacy: Knowledge extracted is private • Goal: to develop algorithms for modifying the original data, so that • private data are protected • private knowledge remain private even after the mining tasks • Analysis results are still useful • Natural trade-off between privacy quantification and data utility
Secure Outsourcing of Data Mining • all encrypted transactions in D* and items contained in it are secure • given any mining query the server can compute the encrypted result • encrypted mining and analysis results are secure • the owner can decrypt the results and so, reconstruct the exact result • the space and time incurred by the owner in the process has to be minimum • The server has access to data of the owner • Data owner has the property of • Data • Knowledge extracted from data
A Solution for Pattern Mining: K-anonymity • Attack Model: the attacker knows the set of plain items and their true supports in D exactly and has access to the encrypted database D∗ • Item-based attack: guessing the plain item corresponding to the cipher item e with probability prob(e) • Itemset-based attack: guessing the plain itemset corresponding to the cipher itemsetE with probability prob(E) • Encryption: • Replacing each plain item in D by a 1-1 substitution cipher • Adding fake transactions • K-Anonymity: for each item ethere are at least others k-1 cipher items + • Decryption: A Synopsis allows computing the actual support of every pattern
Privacy-Preserving DT Framework • GOAL: publishing and sharing various forms of data without disclosing sensitive personal information while preserving mining results • Sequence data • Query-Log data • ….… • Problem:Anonymizingsequence data while preserving sequential pattern mining results • Attack Model: Sequence Linking Attack • The attacker knows part of a sequence and want to guess the whole correct sequence • Idea: Combining k-anonymity and sequencehidingmethods and reformulating the problem as that ofhidingk-infrequent sequences
Running example: k = 2 Dataset D B C A B C D A B C D B C E B C D Root Root Root Root Root Prefix Tree Construction Tree Pruning B:3 B:2 B:1 B:2 B:3 A:2 A:2 A:2 A:3 A:2 Lcut B C E : 1 B C D : 1 C:2 C:3 C:3 C:1 C:2 B:2 B:2 B:2 B:2 B:3 D:1 D:1 C:2 C:2 C:2 C:2 C:3 E:1 E:1 D : 2 D : 3 D : 2 D : 2 D:2 Dataset D’ B C A B C D A B C D B C A B C D Tree Reconstruction Generation of D’ LCS: 1. B C 2. B C D