210 likes | 219 Views
This paper proposes a novel approach for mining unexpected rules by considering user dynamics and knowledge. It introduces a preference model where users specify the best knowledge rules for each tuple and a violation model to measure the unexpectedness of rules based on the violation of satisfying tuples. The algorithm includes three phases: violation, rule generation, and final selection. Experimental results on a real dataset demonstrate the effectiveness of the approach.
E N D
Mining Unexpected Rules by Pushing User Dynamics Ke Wang Yuelong Jiang Laks V.S. Lakshmanan
Unexpected Rules • Unexpectedness: user finds the rules surprising • Existing approaches • Syntax distance (B. Liu, W. Hsu, AAAI96) • Logical contradiction (B. Padmanabhan, A. Tuzhilin, KDD98) • Both by direct comparison between rules KDD03
Our approach: Data Violation • Knowledge rules Ui: • The data rule r: unexpected to the user who links“owning house at BeverlyHill” to “movie stars” and “well paid” • Each tuple that satisfies r but violates Ui is an evidence for unexpectedness of r KDD03
Three Issues • Knowledge Dynamics • User decides the best knowledge to apply given a scenario (i.e., a tuple) --- modeling • Knowledge Push • Push user knowledge right from the start of search --- rule mining • Unexpectedness Dynamics • Adjust the unexpectedness of remaining rules by what has been presented so far --- rule selection KDD03
Rule Representation • Knowledge rules and data rules: • Domain values in data rules, and fuzzy terms (such as “High”, “Low”) in knowledge rules. • Match degree measures the match between a domain value (i.e., Primary) and a fuzzy term (i.e., Low) Target attribute KDD03
Main Ideas • Preference model: the user specifies the “best” knowledge rules for each tuple • e.g., U1 and U2 for those owning a house at BeverlyHill • Violation model: we measure the unexpectedness of r by the “violation” of satisfying tuples to their best knowledge rules. KDD03
The Preference Model • User specifies covering knowledge for each tuple: • d (covering depth) “best” knowledge rules that match the tuple • Ways to specify “best”: • Explicit enumeration (not scalable) • Rank by preference: “max strength”, “best match”, “min violation”, etc. KDD03
The Violation Model • For a tuple t and a knowledge rule U: • Body match degree, bm(t,U), in [0,1] • Head match degree, hm(t,U), in [0,1] • Violation of U by t • Violation of t, v(t), is aggregated v(t,U) over the covering knowledge U of t. if bm(t, U) otherwise KDD03
Ustr The Mining Problem • Unexpectedness Support of r • Unexpectedness Confidence of r • Unexpectedness of r • Problem: Find all data rules r above specified thresholds for Usup and Ustr. KDD03
The Mining Algorithm • Three Phases • Violation Phase • Rule Phase • Final Phase KDD03
Violation Phase • Compute and store v(t) for all tuples t in the database T, pruning all t with v(t) = 0; get new database T’ • prunes the data consistent with the user knowledge, very effective. KDD03
Rule Phase • Generate all rules r with Usup(r)above thresholdusing T’ • Usup(r) is anti-monotone • Usup(r) decreases as the body b(r) grows • independent of preference model and violation function v(t) • Any frequent itemset algorithms can be applied in this phase KDD03
Final Phase • Compute sup(r) and sup(b(r)) for rules produced in rule phase • Output rules r with Ustr(r) above threshold. KDD03
The Selection Problem • Display a specified number k of rules to the user, in the order of unexpectedness • See-and-Know Assumption • After seeing rules R, user is interested in only rules that are unexpected with respect to KDD03
The Selection Algorithm • At each step, • greedily select the most unexpected rule (until k rules are selected or there is no rule to select) • add the selected rule to user knowledge • for each matching tuple, update the violation values to reflect the new covering knowledge. KDD03
Experiment Dataset • KDD-CUP-98 Dataset • Target Attribute • NK97: donation amount in 1997 campaign • five scales: c0, c1, c2, c3, c4, in increasing order. • 23 non-target attributes • Their meanings are easier to understand than other attributes KDD03
User Knowledge • Observation: People tend to remain unchanged in donation behaviors • Four knowledge rules: KDD03
Efficiency of Mining • Three Algorithms • UMINE(NULL), without user knowledge • UMINE-Unpruned, without tuple pruning • UMINE-Pruned, pruning those tuples with vt = 0 KDD03
Violate two rules Interestingness of Rules Ui(x,y): Ui covers x tuples with total violation y KDD03
Conclusion • A new approach for finding interesting rules by modeling user knowledge • Violation of covering knowledge by satisfying tuples • Model human user as a dynamic entity in applying knowledge and interpreting presented rules. • Push user knowledge in data preparation, mining, and rule selection. This benefits both search and quality. KDD03