1 / 6

PRIVACY AND security Issues IN Data Mining

Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti. P.h.D . Candidate: Anna Monreale. PRIVACY AND security Issues IN Data Mining. University of Pisa Department of Computer Science. Privacy-Preserving Data Mining. New privacy-preserving data mining techniques:

Download Presentation

PRIVACY AND security Issues IN Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supervisors Prof.DinoPedreschi Dott.ssa FoscaGiannotti P.h.D. Candidate: Anna Monreale PRIVACY AND security Issues IN Data Mining University of Pisa Department of Computer Science

  2. Privacy-Preserving Data Mining • New privacy-preserving data mining techniques: • For individual privacy: Personal data are private • For corporate privacy: Knowledge extracted is private • Goal: to develop algorithms for modifying the original data, so that • private data are protected • private knowledge remain private even after the mining tasks • Analysis results are still useful • Natural trade-off between privacy quantification and data utility

  3. Secure Outsourcing of Data Mining • all encrypted transactions in D* and items contained in it are secure • given any mining query the server can compute the encrypted result • encrypted mining and analysis results are secure • the owner can decrypt the results and so, reconstruct the exact result • the space and time incurred by the owner in the process has to be minimum • The server has access to data of the owner • Data owner has the property of • Data • Knowledge extracted from data

  4. A Solution for Pattern Mining: K-anonymity • Attack Model: the attacker knows the set of plain items and their true supports in D exactly and has access to the encrypted database D∗ • Item-based attack: guessing the plain item corresponding to the cipher item e with probability prob(e) • Itemset-based attack: guessing the plain itemset corresponding to the cipher itemsetE with probability prob(E) • Encryption: • Replacing each plain item in D by a 1-1 substitution cipher • Adding fake transactions • K-Anonymity: for each item ethere are at least others k-1 cipher items + • Decryption: A Synopsis allows computing the actual support of every pattern

  5. Privacy-Preserving DT Framework • GOAL: publishing and sharing various forms of data without disclosing sensitive personal information while preserving mining results • Sequence data • Query-Log data • ….… • Problem:Anonymizingsequence data while preserving sequential pattern mining results • Attack Model: Sequence Linking Attack • The attacker knows part of a sequence and want to guess the whole correct sequence • Idea: Combining k-anonymity and sequencehidingmethods and reformulating the problem as that ofhidingk-infrequent sequences

  6. Running example: k = 2 Dataset D B C A B C D A B C D B C E B C D Root Root Root Root Root Prefix Tree Construction Tree Pruning B:3 B:2 B:1 B:2 B:3 A:2 A:2 A:2 A:3 A:2 Lcut B C E : 1 B C D : 1 C:2 C:3 C:3 C:1 C:2 B:2 B:2 B:2 B:2 B:3 D:1 D:1 C:2 C:2 C:2 C:2 C:3 E:1 E:1 D : 2 D : 3 D : 2 D : 2 D:2 Dataset D’ B C A B C D A B C D B C A B C D Tree Reconstruction Generation of D’ LCS: 1. B C 2. B C D

More Related