1 / 21

Mining Unexpected Rules by Pushing User Dynamics

This paper proposes a novel approach for mining unexpected rules by considering user dynamics and knowledge. It introduces a preference model where users specify the best knowledge rules for each tuple and a violation model to measure the unexpectedness of rules based on the violation of satisfying tuples. The algorithm includes three phases: violation, rule generation, and final selection. Experimental results on a real dataset demonstrate the effectiveness of the approach.

rbeyer
Download Presentation

Mining Unexpected Rules by Pushing User Dynamics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Unexpected Rules by Pushing User Dynamics Ke Wang Yuelong Jiang Laks V.S. Lakshmanan

  2. Unexpected Rules • Unexpectedness: user finds the rules surprising • Existing approaches • Syntax distance (B. Liu, W. Hsu, AAAI96) • Logical contradiction (B. Padmanabhan, A. Tuzhilin, KDD98) • Both by direct comparison between rules KDD03

  3. Our approach: Data Violation • Knowledge rules Ui: • The data rule r: unexpected to the user who links“owning house at BeverlyHill” to “movie stars” and “well paid” • Each tuple that satisfies r but violates Ui is an evidence for unexpectedness of r KDD03

  4. Three Issues • Knowledge Dynamics • User decides the best knowledge to apply given a scenario (i.e., a tuple) --- modeling • Knowledge Push • Push user knowledge right from the start of search --- rule mining • Unexpectedness Dynamics • Adjust the unexpectedness of remaining rules by what has been presented so far --- rule selection KDD03

  5. Rule Representation • Knowledge rules and data rules: • Domain values in data rules, and fuzzy terms (such as “High”, “Low”) in knowledge rules. • Match degree measures the match between a domain value (i.e., Primary) and a fuzzy term (i.e., Low) Target attribute KDD03

  6. Main Ideas • Preference model: the user specifies the “best” knowledge rules for each tuple • e.g., U1 and U2 for those owning a house at BeverlyHill • Violation model: we measure the unexpectedness of r by the “violation” of satisfying tuples to their best knowledge rules. KDD03

  7. The Preference Model • User specifies covering knowledge for each tuple: • d (covering depth) “best” knowledge rules that match the tuple • Ways to specify “best”: • Explicit enumeration (not scalable) • Rank by preference: “max strength”, “best match”, “min violation”, etc. KDD03

  8. The Violation Model • For a tuple t and a knowledge rule U: • Body match degree, bm(t,U), in [0,1] • Head match degree, hm(t,U), in [0,1] • Violation of U by t • Violation of t, v(t), is aggregated v(t,U) over the covering knowledge U of t. if bm(t, U)   otherwise KDD03

  9. Ustr The Mining Problem • Unexpectedness Support of r • Unexpectedness Confidence of r • Unexpectedness of r • Problem: Find all data rules r above specified thresholds for Usup and Ustr. KDD03

  10. The Mining Algorithm • Three Phases • Violation Phase • Rule Phase • Final Phase KDD03

  11. Violation Phase • Compute and store v(t) for all tuples t in the database T, pruning all t with v(t) = 0; get new database T’ • prunes the data consistent with the user knowledge, very effective. KDD03

  12. Rule Phase • Generate all rules r with Usup(r)above thresholdusing T’ • Usup(r) is anti-monotone • Usup(r) decreases as the body b(r) grows • independent of preference model and violation function v(t) • Any frequent itemset algorithms can be applied in this phase KDD03

  13. Final Phase • Compute sup(r) and sup(b(r)) for rules produced in rule phase • Output rules r with Ustr(r) above threshold. KDD03

  14. The Selection Problem • Display a specified number k of rules to the user, in the order of unexpectedness • See-and-Know Assumption • After seeing rules R, user is interested in only rules that are unexpected with respect to KDD03

  15. The Selection Algorithm • At each step, • greedily select the most unexpected rule (until k rules are selected or there is no rule to select) • add the selected rule to user knowledge • for each matching tuple, update the violation values to reflect the new covering knowledge. KDD03

  16. Experiment Dataset • KDD-CUP-98 Dataset • Target Attribute • NK97: donation amount in 1997 campaign • five scales: c0, c1, c2, c3, c4, in increasing order. • 23 non-target attributes • Their meanings are easier to understand than other attributes KDD03

  17. User Knowledge • Observation: People tend to remain unchanged in donation behaviors • Four knowledge rules: KDD03

  18. Efficiency of Mining • Three Algorithms • UMINE(NULL), without user knowledge • UMINE-Unpruned, without tuple pruning • UMINE-Pruned, pruning those tuples with vt = 0 KDD03

  19. Violate two rules Interestingness of Rules Ui(x,y): Ui covers x tuples with total violation y KDD03

  20. Effectiveness of Selection KDD03

  21. Conclusion • A new approach for finding interesting rules by modeling user knowledge • Violation of covering knowledge by satisfying tuples • Model human user as a dynamic entity in applying knowledge and interpreting presented rules. • Push user knowledge in data preparation, mining, and rule selection. This benefits both search and quality. KDD03

More Related