1 / 39

Pruning and Summarizing the Discovered Associations

This paper discusses the process of pruning and summarizing discovered associations in association rule mining, focusing on Direction Setting rules. It explores the challenges of handling vast amounts of association rules and proposes a solution through identifying essential relationships in the data.

ralbers
Download Presentation

Pruning and Summarizing the Discovered Associations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pruning and Summarizing the Discovered Associations Bing Liu Wynne Hsu Yiming Ma National University of Singapore Tushita Agrawal, Manju Navani April 14th, 2005

  2. Outline • Association Rule Mining • Related Work • Chi-Square Test • Direction Setting Rules • Pruning & Finding DS Rules • The Algorithm • Empirical Evaluation CS 583

  3. Association Rules • A class of important regularities in data. • I = {i1, …, in} : a set of items. D : a set of data cases. • An association rule is an implication of the form X -> Y, where X C I, Y C I, and X n Y = Ø. • Rule X-> Y holds in D with confidence c if c% of data cases in D that support X also support Y. • The rule has support s in D if s% of the data case in D contains X U Y. CS 583

  4. Association Rule Mining • Generating all association rules that have support and confidence greater than the user-specified min support and min confidence. • Algorithm: A 2-step process: • Finds all large itemsets that meet the min support constraint. • Generates rules form all large itemsets that satisfy the min confidence constraint. CS 583

  5. Strengths & Drawbacks • Efficiently discovers the complete set of associations that exist in data. • These associations provide a complete picture of the underlying regularities in the domain. • The number of discovered associations can be huge. • Very difficult to be analyzed by a human user. • Even worse with those data sets whose items are highly correlated. CS 583

  6. Solution??? • Give an arbitrary small subset of the rules to the user. • NOT SATISFACTORY!!! • Small subset can only give a partial picture of the domain. • “Can the completeness of association rule mining be preserved without overwhelming the user?” CS 583

  7. Solution – DS Rules • Prune insignificant rules and find a special subset of association rules that represent the underlying relationships in the data. Direction setting (DS) rules. • Give a summary of the behavior of the discovered associations. • Represent the essential relationships or structure (or skeleton) of the domain. CS 583

  8. In this paper... • Association rule mining from a relational table • Item: attribute = value • Numerical values discretized • Targeted at a specific attribute • Rule expressed as: X->y where y: a value of the target attribute, and X: a set of items from the rest of attributes • Large rule: A rule that meets the min supp. • Min conf is not used since does not reflect relationship of the domain. • Statistical correlation is the basis for rule finding. CS 583

  9. Proposed Technique • Two steps: • Pruning the association rules • Summarizing the unpruned rules Direction setting rules Discovered large rules Pruning Significant rules Summarization Non Direction setting rules CS 583

  10. Related Work • Template-Based Approach: • Most straightforward method for selecting interesting association rules • A template describes a set of rules in terms of the items occurring in the conditional and the consequent parts. e.g., Fruit+, Diary_product*  Meat. • User specifies what he/she wants to see using templates • The system then finds only those matching rules. CS 583

  11. Related Work (cont...) • Subjective Interestingness: • Interactive and iterative approach for finding unexpected rules. • User specifies his/her existing knowledge of the domain. • The system then finds unexpected rules by comparing user’s knowledge with the discovered rules. • If the unexpected rules identified by the system are not truly unexpected, they serve to remind the user what he/she might have forgotten. CS 583

  12. Related Work (cont...) • Association Rule Cover Based: • A cover is a subset of the discovered associations that can cover the database. • The number of rules in a cover can be quite small. • A greedy algorithm is proposed to find a good cover and the remaining rules are pruned. • The problem with this method is that the advantage of association rules, its completeness, is lost. CS 583

  13. Related Work (cont...) • Constraint Based Rule Mining: • Technique using Minimum Improvement. • Difference between the conf of a rule and the conf of any proper sub-rule with the same consequent. • Prune rules which do not meet the minimum improvement in conf. • Similar pruning method but chi-square test as basis of pruning. CS 583

  14. Related Work (cont...) • Online Generation of ARs: • Technique to remove two types of redundant rules – simple and strict redundancy. • A rule R1is redundant w.r.t another rule R2 if the sup and conf of R1 are always at least as large as the sup and conf of R2. • Simple: Remove rules that are derived from same itemset. Ex: AB => C is redundant w.r.t. A=>BC • Strict: Applies to two itemsets and one is a subset of the other. Ex: X=> Y is redundant w.r.t. X=>YZ CS 583

  15. Limitations • Do not prune insignificant rules. • Do not provide a summary the discovered rules. CS 583

  16. Chi-Square Test Statistics (2) • To test Independence and Correlation • The closer the observed frequency is to expected frequency => greater is the weight of evidence in favor of independence • E.g. Association rule for a bank Job = yes -> Loan = approved [sup = 200/500, conf = 200/300] CS 583

  17. Chi-Square Test Statistics(2)(cont..) • “Is loan approval dependent on whether one has a job or not?” • Hypothesis – two attributes are independent • Of 500 people • 300(60%) had a job,200(40%) had no job • 280 approved and 220 not-approved cases to be divided in same ratio. • 2 tests the significance of the deviation from the expected values CS 583

  18. Chi-Square Test Statistics(2)(cont...) • A 2 value of 0 implies attributes are statistically independent • If higher than certain threshold then reject the independence assumption • In our case 2 = 34.63 which is much larger than the threshold(3.84 at significance 95%) • Conclusion - loan approval is correlated to whether one has a job. • How are they correlated? CS 583

  19. Chi-Square Test Statistics(2)(cont...) s : minimum support c : significance level X -> y are said to be (s,c) • Correlated if • The rule’s support exceeds s. • The 2 value for the rule with respect to the whole data exceeds 2 at c • Uncorrelated (independent) if • The rule’s support exceeds s. • The 2 value for the rule with respect to the whole data does not exceed 2 at c CS 583

  20. Chi-Square Test Statistics(2)(cont...) Types of correlation/direction • Positive correlation • X and y of rule r, X -> y are correlated and f0 / f > 1 then Direction of r is 1 • Negative correlation • X and y of rule r, X -> y are correlated and f0 / f < 1 then Direction of r is -1 • Independence • X and y of rule r, X -> y are independent then Direction of r is 0 The rule in previous example represents positive correlation f0 = 200 ; f = 300 * 280/500 = 168 CS 583

  21. Chi-Square Test Statistics(2)(cont...) A generic 2 X 2 contingency table To compute the correlation type for rule r: X -> y Compare the rule with the whole population or the whole data set CS 583

  22. Direction Setting (DS) Rules Set of expected directions of a rule r is • If r is a 1-condition rule, the set of expected direction is {0} (condition and consequent are expected to be independent) • If r is a k-condition rule r (k>1) of the form r: a1,a2,.., ak -> y • View r as a combination of 2 rules, a 1-condition rule r1: ai -> y and (k-1) condition rule rrest : a1,a2,..,aj -> y • The expected direction for this combination is CS 583

  23. Direction Setting (DS) Rules(cont..) A rule r is a DS rule, if it satisfies • It has a direction of 1 (positive correlation) • Its direction is not an element of the set of expected directions A non-direction setting (non-DS) rule is a positively correlated rule that is not a DS rule CS 583

  24. Direction Setting (DS) Rules(cont..) All possible combinations of r1, rrest and r r1.dir, rrest.dir := r.dir or rrest.dir,r1.dir := r.dir • C(2), D(2), E(2) and F(2) have direction of r as 1, but expected direction is 0 or Unknown. • Thus r sets a new direction Potential DS rule. • r is a DS rule if for all combinations of r1 and rrest, direction of r is different from expected directions. CS 583

  25. Pruning Association Rules • Why? • The number of discovered associations can be huge. • Many mined rules are spurious and insignificant. • Their existence may simply be due to chance. • Ex: P: Job=yes ->Loan=approved [sup=60%,conf=90%] r: Job=yes, Credit_history=good -> Loan= approved [sup = 40%, conf = 91%] • r is insignificant with respect to P (P is more general). • r's slightly higher confidence is more likely due to chance. CS 583

  26. Pruning Association Rules (Cont...) • So r can be pruned w.r.t. P • Instead of using the whole dataset, correlation of r is tested w.r.t. P as r only covers a subset of the data cases that are covered by P • Rule: Given a rule r, X->y, prune r using each ancestor rule P, ->y, which has the same consequent as r but fewer or 0 conditions. • How done? Perform a 2 test on r with respect to P. Positive correlation => keep r Otherwise prune r. CS 583

  27. Finding DS Rules • Evaluate each 1-condition rule to determine its direction status (i.e. -1/0/1) • Proceed level-by-level to analyze each rule and check if it follows the direction set by previous level rules or it sets a new direction : • At level k (k >1), for each k-condition rule r, determine its direction. • Examine each combination of 1-condition rule r1 and (k-1)-condition rule rrestof r to determine whether r follows the expected direction set by r1 and rrest. • If r follows the direction set by at least one such combination, r is NOT a DS rule. If not, then r sets a new direction, and it is a DS rule. CS 583

  28. Finding DS Rules (cont...) A (1) 1, 1 := 1 (2) 1, 1 := -1 (3) 1, 1 := 0 B (1) 1, 0 := 1 (2) 1, 0 := -1 (3) 1, 0 := 0 C (1) 0, 0 := 0 (2) 0, 0 := 1 (3) 0, 0 := -1 D (1) -1, -1 := -1 (2) -1, -1 := 1 (3) 1, 1 := 0 E (1) -1, 0 := -1 (2) -1, 0 := 1 (3) -1, 0 := 0 CS 583

  29. Some Important Points • Every DS rule r is unexpected w.r.t. all r1 and rrestcombinations because r does not follow their directions. • After seeing the DS rules, the directions of non-DS rules are no longer surprising as they are just some combinations of DS rules and independence rules. • DS rules can guide the user to see the related non-DS rules, if he/she is interested. The non-DS rules can provide further details with regard to the DS rules. CS 583

  30. Interactive Exploration • User interface to interactively • focus on the essential aspects of the domain. • selectively view the relevant details. • exploration of DS and Non-DS rules. • Features • DS can be viewed according to their levels. Ex: Level 1 rules R2:(1) (DS) Age=young -> Loan=not-approved Level 2 rule: R5: (1) (DS) Age=young,OH=yes-> Loan=approved CS 583

  31. Features (cont…) • View reason for classifying a rule as DS. Ex: Level 1 rules R1:(-1) Age=young -> Loan=approved R2:(1) (DS) Age=young -> Loan=not-approved R3:(1) (DS) OH=yes -> Loan=approved R4:(-1) OH=yes -> Loan=not-approved Level 2 rule: R5: (1) (DS) Age=young,OH=yes -> Loan=approved -1, 1 := 1 {unexpected} System displays R1 & R3 together with directions CS 583

  32. View relevant non-DS rules. Ex: Level 1 rules R1:(1) (DS) Job=yes -> Loan=approved R2:(1) (DS) OH=yes -> Loan=approved Level 2 rule: R3: (1) (non) Job=yes, OH=yes -> Loan=approved 1,1 := 1 {expected} System displays R3 when the user clicks to view the relevant non-DS rules that follow the direction of R1 and R2. CS 583

  33. P-DS Algorithm P-DS prunes the discovered large rules and finds DS rules Input parameters: • F: the set of discovered large rules. • T: the 2 value at a particular significance level. • Process the discovered rules level-by-level from 1 to n • For each rule at a particular level • Call compDir tocompute the type of correlation/direction of r w.r.t. “-> y” • If r is a level-1 rule and its direction is 1 then r is a DS rule • else r’s direction is not 1 so record r as pruned by “-> y” CS 583

  34. P-DS Algorithm (cont..) • For all other levels, process r using all pairs of riand rrest • If r is pruned and cannot be a DS rule, Exit • Evaluate r by calling evalPrune() • If r is still a potential DS rule Analyze r by considering the four direction setting cases to determine if it can be DS rule • If from the previous analyses, r can be a DS rule, check if r is pruned. If pruned, set direction of r as undefined else r adds to the DS rules set. CS 583

  35. P-DS Algorithm(cont…) Procedure compDir(r,R,T) – computes direction of rule r If X2 value of rule r exceeds the threshold value If observed frequency > expected frequency then Direction = 1 else Direction = -1 Else Direction = 0 Procedure evalPrune(r,rrest) – tries to prune r using rrest • If rrest is pruned, get the rule that prunes rrest • If direction of r is not 1, then prune r and set its prune value as rrest • If direction of r is 1, then compare it with rrestusing 2. If result <> 1 then prune r. CS 583

  36. Empirical Evaluation • Data with huge number of associations • 25 data sets from UCI Machine Learning Repository • Remaining 5 are real-life application data sets • Large rules processed in memory (combinatorial explosion) – limit 80,000 • Attributes are discretized into intervals using the target attribute • Experiments were repeated with • Significance levels for 2 test – 95% and 90% • Minimum support (minsup) values – 2% and 1% CS 583

  37. Empirical Evaluation (cont…) CS 583

  38. Empirical Evaluation (cont…) • Minimum support increases, fewer DS rules are produced • Significance level is lowered to 90%, more DS rules are produced • Pruning is less stringent CS 583

  39. Conclusion • P-DS Algorithm • Prunes rules that contain less information as compared to their ancestors • Identifies the direction setting rules giving a global picture of underlying relationships in the domain • The number of DS rules generated can be manually inspected by human user • However, • This technique does not use minimum confidence • Does not report rules with negative correlation CS 583

More Related