360 likes | 367 Views
This study explores distortion-based and blocking-based techniques for hiding sensitive association rules in a database, with a focus on minimizing side effects and maintaining data quality.
E N D
An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* pontikak@ceid.upatras.gr Dept. of Computer Engineering and Informatics University of Patras Patra, Greece Vassilios Verykios* verykios@cti.gr Dept. of Computer and Communication EngineeringUniversity of ThessalyVolos, Greece *Computer Technology Institute Research Unit 3 Athens, Greece
Outline • Introduction - Related Work • Distortion-based Techniques • Blocking-based Techniques • Comparison and Analysis • Conclusions
Data Mining Association Rules Hide Sensitive Rules User Introduction Database Changed Database
Related Work • Association Rule Hiding • Blocking-based Technique (Saygin, Verykios, Clifton) • Distortion-based (Sanitization) Technique – (Oliveira, Zaiane, Verykios, Dasseni)
Outline • Introduction - Related Work • Distortion-based Techniques • Blocking-based Techniques • Comparison and Analysis • Conclusion
Distortion Algorithm Distortion-based Techniques Sample Database Distorted Database Rule A→C has: Support(A→C)=80% Confidence(A→C)=100% Rule A→C has now: Support(A→C)=40% Confidence(A→C)=50%
Distortion-based Techniques • Challenges/Goals: • To minimize the undesirable Side Effects that the hiding process causes to non-sensitive rules. • To minimize the number of 1’s that must be deleted in the database. • Algorithms must be linear in time as the database increases in size.
Our Proposal: Weight-based Sorting Distortion Algorithm (WSDA) • High Level Description: • Input: • Initial Database • Set of Sensitive Rules • Safety Margin (for example 10%) • Output: • Sanitized Database • Sensitive Rules no longer hold in the Database
WSDA Algorithm • High Level Description: • 1st step: • Retrieve the set of transactions which support sensitive rule RS • For each sensitive rule RS find the number N1 of transaction in which, one item that supports the rule will be deleted
WSDA Algorithm • High Level Description: • 2nd step: • For each rule Ri in the Database with common items with RScompute a weight w that denotes how strong is Ri • For each transaction that supports RS compute a priority Pi, that denotes how many strong rules this transaction supports
WSDA Algorithm • High Level Description: • 3rd step: • Sort the N1 transactions in ascending order according to their priority value Pi • 4th step: • For the first N1 transactions hide an item that is contained in RS
WSDA Algorithm • High Level Description: • 5th step: • Update confidence and support values for other rules in the database
Rules Changed In the Database Itemsets Remained unaffected in the Database Experimental Results of WSDA algorithm
Average number of items per transaction: 13/50 Average number of items per transaction: 20/50 Experimental Results of WSDA algorithm
Outline • Introduction - Related Work • Distortion-based Techniques • Blocking-based Techniques • Comparison and Analysis • Conclusion
Quality of Data • Sometimes it is dangerous to delete some items from the database (etc. medical databases) because the false data may create undesirable effects. • So, we have to hide the rules in the database by adding uncertainty without distorting the database.
Blocking Algorithm Blocking-based Techniques Initial Database New Database Support and Confidence becomes marginal. In New Database: 60% ≤ conf(A → C) ≤ 100%
Modification of Association Rule Definition • A rule’s A→B confidence and support becomes marginal: sup(A→B)[minsup(A→B), maxsup(A→B)] conf(A→B) [minconf(A→B), maxconf(A→B)] • minsup(A→B)= • maxsup(A→B)=
Modification of Association Rule Definition • minconf(A→B)= • maxconf(A→B)=
Negative Border Rules Set (NBRS) Definition • When a rule R has either • sup(R)>MST AND conf(R)<MCT OR • sup(R)<MST AND conf(R)>MCT, then we say that R belongs to NBRS.
Side Effects Definition Modification in Blocking-based Techniques
Privacy Breaches Definitions • If an item i, some values of which, are hidden by ?’s, is contained in a sensitive rule, a privacy breach will occur if the adversary can assume that with c% confidence. • For a rule R with maxconf(R)>MCT, a privacy breach occurs if it can be estimated, with c% confidence, that R is either a sensitive or a ghost rule. • For a blocked item iin a specific transaction T, a privacy breach occurs if the adversary can estimate with c%confidence that its original value is either 0 or 1.
Blocking-Based Techniques • Goals that an algorithm has to achieve: • To put a relatively small number of ?’s and reduce significantly the confidence of senstitive rules. • To minimize the undesirable side effects (rules and itemsets lost) by selecting the items in the appropriate transactions to change, and maximize the desirable side effects. • To modify the database in a way that an adversary cannot recover the original values of the database.
Our Proposal: Blocking Algorithm (BA) • High Level Description • 1st step: • For each sensitive rule RS (Rule RS has left itemset IL and right itemset IR) compute how many 0’s and 1’s you have to block, in order to reduce the confidence of RS. • 2nd step: • Find the set of transactions TR that support RS or the set of transactions TLpR’that support partially RS (support partially the left itemset and do not support the right itemset). • For each transaction in TRfind the rules Rcommonwith at least one common item with IRand for each transaction in TLpR’find the R’common∈NBRSwith at least one common item with IL. • Assign a weight wfor each Rcommonand a weight w’ for each R’common. • Assign a PTfor each transaction in T such as PTis large if transaction Ti has many Rcommon rules with large w, and a priority value PT’for each Ti’ such as PT’is small if transaction T has many Rcommon rules with large w’.
Blocking Algorithm • High Level Description • 3rd step: • Sort T∈TRstarting from them with lowest PTi. and sort T’∈TL’Rpstarting from them with highest PTi’. • 4th step: • For the first N1sorted T∈TRblock an item i∈IRand for the first N0sorted T∈TL’Rp block an item i∈ IL • 5th step: • Update values minconf(Ri),minsup(Ri), for all other rules that have been affected.
Blocking-Based Techniques • Main Problems of blocking technique: • The maximum confidence of a sensitive rule cannot be reduced. • An adversary can infer the hidden values if he applies a smart inference technique, if the blocking algorithm does not add much uncertainty in the database. • Both 0’s and 1’s must be hidden, because if only 1’s were hidden the adversary would simply replace all the ?’s with 1’s and would restore easily the initial database. • Many ?’s must be inserted, if we don’t want an adversary to infer hidden data.
Large Itemsets Remained after The hiding process Rules changed (%) after the process Experimental Results of Blocking Algorithm
Databases with average 13 items per transaction Databases with average 20 items per transaction Experimental Results of Blocking Algorithm (2)
Rules changed, when we Change the proportion 0:1 Decision Tree Experiments Misclassified Items (%) Experimental Results of Blocking Algorithm (3)
Outline • Introduction - Related Work • Distortion-based Techniques • Blocking-based Techniques • Comparison and Analysis • Conclusions
Outline • Introduction - Related Work • Distortion-based Techniques • Blocking-based Techniques • Comparison and Analysis • Conclusions
Conclusions • There are open research problems in Blocking Technique: • A) What techniques must be used in order to reduce the privacy breaches? • B) In what other ways can we prevent an adversary from inferring the association rules in the database? • C) Maybe applying a chi-square test to the final database reveal some correlations between the items
References • [Evfimienski et.al] Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke. Privacy Preserving Mining of Association Rules. SIGKDD 2002, Edmonton, Alberta Canada. • Murat Kantarcioglou and Chris Clifton, Privacy Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2002), 24–31. • Jaideep Vaidya and Chris Clifton, Privacy Preserving Association Rule Mining in Vertically Partitioned Data, In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002), 639–644.
References • Stanley R. M. Oliveira and Osmar R. Zaïane. Algorithms for Balacing Privacy and Knowledge Discovery in Association Rule Mining. In Proc. of the Seventh International Database Engineering & Applications Symposium (IDEAS'03), pp. 54-63, Hong Kong, July 16-18, 2003. • Yucel Saygin, Vassilios Verykios, and Chris Clifton, Using Unknowns to Prevent Discovery of Association Rules, SIGMOD Record 30 (2001), no. 4, 45–54. • S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa, Yucel Saygin, and Dasseni Elena, Association Rule Hiding, IEEE Transactions on Knowledge and Data Engineering (2003).