Mining Unexpected Rules by Pushing User Dynamics

Mining Unexpected Rules by Pushing User Dynamics Ke Wang Yuelong Jiang Laks V.S. Lakshmanan

Unexpected Rules • Unexpectedness: user finds the rules surprising • Existing approaches • Syntax distance (B. Liu, W. Hsu, AAAI96) • Logical contradiction (B. Padmanabhan, A. Tuzhilin, KDD98) • Both by direct comparison between rules KDD03

Our approach: Data Violation • Knowledge rules Ui: • The data rule r: unexpected to the user who links“owning house at BeverlyHill” to “movie stars” and “well paid” • Each tuple that satisfies r but violates Ui is an evidence for unexpectedness of r KDD03

Three Issues • Knowledge Dynamics • User decides the best knowledge to apply given a scenario (i.e., a tuple) --- modeling • Knowledge Push • Push user knowledge right from the start of search --- rule mining • Unexpectedness Dynamics • Adjust the unexpectedness of remaining rules by what has been presented so far --- rule selection KDD03

Rule Representation • Knowledge rules and data rules: • Domain values in data rules, and fuzzy terms (such as “High”, “Low”) in knowledge rules. • Match degree measures the match between a domain value (i.e., Primary) and a fuzzy term (i.e., Low) Target attribute KDD03

Main Ideas • Preference model: the user specifies the “best” knowledge rules for each tuple • e.g., U1 and U2 for those owning a house at BeverlyHill • Violation model: we measure the unexpectedness of r by the “violation” of satisfying tuples to their best knowledge rules. KDD03

The Preference Model • User specifies covering knowledge for each tuple: • d (covering depth) “best” knowledge rules that match the tuple • Ways to specify “best”: • Explicit enumeration (not scalable) • Rank by preference: “max strength”, “best match”, “min violation”, etc. KDD03

The Violation Model • For a tuple t and a knowledge rule U: • Body match degree, bm(t,U), in [0,1] • Head match degree, hm(t,U), in [0,1] • Violation of U by t • Violation of t, v(t), is aggregated v(t,U) over the covering knowledge U of t. if bm(t, U)   otherwise KDD03

Ustr The Mining Problem • Unexpectedness Support of r • Unexpectedness Confidence of r • Unexpectedness of r • Problem: Find all data rules r above specified thresholds for Usup and Ustr. KDD03

The Mining Algorithm • Three Phases • Violation Phase • Rule Phase • Final Phase KDD03

Violation Phase • Compute and store v(t) for all tuples t in the database T, pruning all t with v(t) = 0; get new database T’ • prunes the data consistent with the user knowledge, very effective. KDD03

Rule Phase • Generate all rules r with Usup(r)above thresholdusing T’ • Usup(r) is anti-monotone • Usup(r) decreases as the body b(r) grows • independent of preference model and violation function v(t) • Any frequent itemset algorithms can be applied in this phase KDD03

Final Phase • Compute sup(r) and sup(b(r)) for rules produced in rule phase • Output rules r with Ustr(r) above threshold. KDD03

The Selection Problem • Display a specified number k of rules to the user, in the order of unexpectedness • See-and-Know Assumption • After seeing rules R, user is interested in only rules that are unexpected with respect to KDD03

The Selection Algorithm • At each step, • greedily select the most unexpected rule (until k rules are selected or there is no rule to select) • add the selected rule to user knowledge • for each matching tuple, update the violation values to reflect the new covering knowledge. KDD03

Experiment Dataset • KDD-CUP-98 Dataset • Target Attribute • NK97: donation amount in 1997 campaign • five scales: c0, c1, c2, c3, c4, in increasing order. • 23 non-target attributes • Their meanings are easier to understand than other attributes KDD03

User Knowledge • Observation: People tend to remain unchanged in donation behaviors • Four knowledge rules: KDD03

Efficiency of Mining • Three Algorithms • UMINE(NULL), without user knowledge • UMINE-Unpruned, without tuple pruning • UMINE-Pruned, pruning those tuples with vt = 0 KDD03

Violate two rules Interestingness of Rules Ui(x,y): Ui covers x tuples with total violation y KDD03

Effectiveness of Selection KDD03

Conclusion • A new approach for finding interesting rules by modeling user knowledge • Violation of covering knowledge by satisfying tuples • Model human user as a dynamic entity in applying knowledge and interpreting presented rules. • Push user knowledge in data preparation, mining, and rule selection. This benefits both search and quality. KDD03

Mining Unexpected Rules by Pushing User Dynamics

Mining Unexpected Rules by Pushing User Dynamics

Presentation Transcript

Data Mining Association Rules

Mining Association Rules

Mining Association Rules

DATA MINING - ASSOCIATION RULES-

Mining Association Rules

Chapter 2: Mining Association Rules

Mining Causal Association Rules

Data Mining Association Rules

Association Rules Mining

Mining the Most Interesting Rules

Incremental Mining Association Rules

Incremental Mining of Association Rules

Association Rules Mining with SQL

Mining Non-Derivable Association Rules

Data Mining in Clinical Databases by using Association Rules

Manipulation By Pushing

Mining Generalized Association Rules

Algorithms for Mining Association Rules

Mining Negative Association Rules

Chapter 2: Mining Association Rules

Introduction to Data Mining Mining Association Rules

Incremental Mining of Association Rules