1 / 26

Identifying Non-Actionable Association Rules

This research focuses on identifying and pruning non-actionable association rules to enhance rule usefulness. Techniques involve chi-square test, rule pruning, and empirical evaluation of rule significance.

elsieearl
Download Presentation

Identifying Non-Actionable Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Non-Actionable Association Rules Authors: Bing Liu, Wynne Hsu, Yiming Ma Graduate: Yu-Wei Su Advisor: Dr. Hsu IDSL,Intellignet Database System Lab

  2. Outline • Motivation • Objective • Introduction • Related work • Chi-square test • Rule pruning • Identifying non-actionable rules • Empirical evaluation • Conclusion • opinion IDSL,Intellignet Database System Lab

  3. Motivation • Finding useful rules for action presents a major problem • Many mining algorithm often produce too many rules • Significant rule does not mean a useful rule for action IDSL,Intellignet Database System Lab

  4. Objective • To propose technique enables the user to focus on fewer rules and to be assured that the remaining rules are useful for action IDSL,Intellignet Database System Lab

  5. Introduction • Association rule mining is efficiently find all rules in data that satisfy the user specified support and confidence constraints • An association rule example: • let I={ij,…,in} be a set of items T a transactions rule: XY [support, confidence] where: IDSL,Intellignet Database System Lab

  6. Introduction( cont) • Example to illustrate what is a non-actionable rule IDSL,Intellignet Database System Lab

  7. Introduction( cont) • R1: no. with disease and BP=high is 60(6%*1000) and coverage of R1 is100(60/60%) • R2: no. with disease and BP=high, Sex=male is 36(3.6%*1000) and coverage of R2 is 40(36/90%) • R3: no. with disease and BP=high, glucoselevel=abnormal is 30(3%*1000) and coverage of R2 is 30(30/100%) IDSL,Intellignet Database System Lab

  8. Introduction( cont) • Coverage of a rule as the number of tuples covered by the rule • A rule covers a data tuple if the tuple satisfies the conditions of the rule • Suppose no. of tuples covered by R2 or R3 with the disease is 58 • The no. of tuples covered by R2 or R3 is 62 IDSL,Intellignet Database System Lab

  9. Introduction( cont) • R1’, ,are: 2 data with disease=yes and 36 with disease=no • conf(2/(2+36)=5.3%) compared to the default conf(500-58)/(1000-62)=47% for disease=yes • is effectively R1.Hence, R1 cannot be actionable IDSL,Intellignet Database System Lab

  10. Introduction( cont) • Two phases of finding all non-actionable rules( involved chi-square) • First generate the rules and prune them according to significance criteria( chi-square) • Analyze the remaining rules backward( from rules with more conditions to rules with fewer conditions) IDSL,Intellignet Database System Lab

  11. Related work • Pruning redundant or insignificant association rules have been studied by many researchers • To proposes a pruning technique using mini improvement[ Bayardo,1999] • Chi-square base test[ Liu,1999] • Generate non-redundant rules[ Zaki,2000] IDSL,Intellignet Database System Lab

  12. Related work( cont) • Remove derivable redundant rule[Aggarwal,1998] • Some are subjective and objective method IDSL,Intellignet Database System Lab

  13. Chi-square test • BP=high->Disease=yes [sup=120/1000=12%,conf=120/300=40%] (3.84 at the 95% significance level) IDSL,Intellignet Database System Lab

  14. Chi-square test( cont) • Definition1(correlated):Let Ds be a sub-dataset of whole dataset D, and c significance level, XY are said to be correlate with respect to Ds if the X2 value for Ds exceed the X2 at c • Definition2(uncorrelated or independent):Let Ds be a sub-dataset of whole dataset D, and c significance level, XY are said to be correlate with respect to Ds if the X2 value for Ds does not exceed the X2 at c IDSL,Intellignet Database System Lab

  15. Chi-square test( cont) • To determine the significance of a rule, it have to know the types of correlation of a rule • A rule is significant iff it is a positively correlated rule IDSL,Intellignet Database System Lab

  16. Chi-square test( cont) • Definition 3(types of correlation) • Positive correlation: if a rule r,xy, are correlated and f0,1/f1>1, than r is a positive correlation • Negative correlation:if a rule r,xy, are correlated and f0,1/f1<1, than r is a positive correlation • Independence: if a rule r, xy, are independent, we say that r shows independence IDSL,Intellignet Database System Lab

  17. Chi-square test( cont) • In general, computing the type of correlaton of an association rule r, xy, is to compare the whole population ”y” IDSL,Intellignet Database System Lab

  18. Rule pruning • Pruning those non-significant rules before the identification of non-actionable rules • Main idea for pruning • Given a rule r, we try to prune r using each ancestor rule R of r. we perform a X2 test on r with respect to the data tuples covered R. If the test shows a positive correlation, r is kept. Otherwise, r is pruned IDSL,Intellignet Database System Lab

  19. Rule pruning( cont) IDSL,Intellignet Database System Lab

  20. Identifying non-actionable rules • Definition 4(potentially actionable rule): a rule R is a potentially actionable(PA) rule; • If R does not have any descendent rules • (There exist some descendent PA rules for R) if after removing those data tuples that can be cover by R’s descendent PA rules, R is still significant with respect to “y” • Definition 5(non-actionable rules):a rule R is non-actionable rule if it is not a PA rule IDSL,Intellignet Database System Lab

  21. Identifying non-actionable rules( cont) IDSL,Intellignet Database System Lab

  22. Identifying non-actionable rules( cont) IDSL,Intellignet Database System Lab

  23. Empirical evaluation • Used 30 datasets, 25 are obtained from UCI ML Repository and 5 are from real-life applications • Set the large rules limitation at 80000 in memory • Using the significance level of 95% for X2 test • Set the mini support to 1% IDSL,Intellignet Database System Lab

  24. Empirical evaluation( cont) IDSL,Intellignet Database System Lab

  25. Conclusion • To presented an algorithm to identify all the non-actionable rules from a set of significant rules • Experiment results show more than 34% of significant rules are not actionable IDSL,Intellignet Database System Lab

  26. Opinion • Providing a point of view to evaluate association rule • “practical usage point of view” may be a important issues IDSL,Intellignet Database System Lab

More Related