260 likes | 271 Views
This research focuses on identifying and pruning non-actionable association rules to enhance rule usefulness. Techniques involve chi-square test, rule pruning, and empirical evaluation of rule significance.
E N D
Identifying Non-Actionable Association Rules Authors: Bing Liu, Wynne Hsu, Yiming Ma Graduate: Yu-Wei Su Advisor: Dr. Hsu IDSL,Intellignet Database System Lab
Outline • Motivation • Objective • Introduction • Related work • Chi-square test • Rule pruning • Identifying non-actionable rules • Empirical evaluation • Conclusion • opinion IDSL,Intellignet Database System Lab
Motivation • Finding useful rules for action presents a major problem • Many mining algorithm often produce too many rules • Significant rule does not mean a useful rule for action IDSL,Intellignet Database System Lab
Objective • To propose technique enables the user to focus on fewer rules and to be assured that the remaining rules are useful for action IDSL,Intellignet Database System Lab
Introduction • Association rule mining is efficiently find all rules in data that satisfy the user specified support and confidence constraints • An association rule example: • let I={ij,…,in} be a set of items T a transactions rule: XY [support, confidence] where: IDSL,Intellignet Database System Lab
Introduction( cont) • Example to illustrate what is a non-actionable rule IDSL,Intellignet Database System Lab
Introduction( cont) • R1: no. with disease and BP=high is 60(6%*1000) and coverage of R1 is100(60/60%) • R2: no. with disease and BP=high, Sex=male is 36(3.6%*1000) and coverage of R2 is 40(36/90%) • R3: no. with disease and BP=high, glucoselevel=abnormal is 30(3%*1000) and coverage of R2 is 30(30/100%) IDSL,Intellignet Database System Lab
Introduction( cont) • Coverage of a rule as the number of tuples covered by the rule • A rule covers a data tuple if the tuple satisfies the conditions of the rule • Suppose no. of tuples covered by R2 or R3 with the disease is 58 • The no. of tuples covered by R2 or R3 is 62 IDSL,Intellignet Database System Lab
Introduction( cont) • R1’, ,are: 2 data with disease=yes and 36 with disease=no • conf(2/(2+36)=5.3%) compared to the default conf(500-58)/(1000-62)=47% for disease=yes • is effectively R1.Hence, R1 cannot be actionable IDSL,Intellignet Database System Lab
Introduction( cont) • Two phases of finding all non-actionable rules( involved chi-square) • First generate the rules and prune them according to significance criteria( chi-square) • Analyze the remaining rules backward( from rules with more conditions to rules with fewer conditions) IDSL,Intellignet Database System Lab
Related work • Pruning redundant or insignificant association rules have been studied by many researchers • To proposes a pruning technique using mini improvement[ Bayardo,1999] • Chi-square base test[ Liu,1999] • Generate non-redundant rules[ Zaki,2000] IDSL,Intellignet Database System Lab
Related work( cont) • Remove derivable redundant rule[Aggarwal,1998] • Some are subjective and objective method IDSL,Intellignet Database System Lab
Chi-square test • BP=high->Disease=yes [sup=120/1000=12%,conf=120/300=40%] (3.84 at the 95% significance level) IDSL,Intellignet Database System Lab
Chi-square test( cont) • Definition1(correlated):Let Ds be a sub-dataset of whole dataset D, and c significance level, XY are said to be correlate with respect to Ds if the X2 value for Ds exceed the X2 at c • Definition2(uncorrelated or independent):Let Ds be a sub-dataset of whole dataset D, and c significance level, XY are said to be correlate with respect to Ds if the X2 value for Ds does not exceed the X2 at c IDSL,Intellignet Database System Lab
Chi-square test( cont) • To determine the significance of a rule, it have to know the types of correlation of a rule • A rule is significant iff it is a positively correlated rule IDSL,Intellignet Database System Lab
Chi-square test( cont) • Definition 3(types of correlation) • Positive correlation: if a rule r,xy, are correlated and f0,1/f1>1, than r is a positive correlation • Negative correlation:if a rule r,xy, are correlated and f0,1/f1<1, than r is a positive correlation • Independence: if a rule r, xy, are independent, we say that r shows independence IDSL,Intellignet Database System Lab
Chi-square test( cont) • In general, computing the type of correlaton of an association rule r, xy, is to compare the whole population ”y” IDSL,Intellignet Database System Lab
Rule pruning • Pruning those non-significant rules before the identification of non-actionable rules • Main idea for pruning • Given a rule r, we try to prune r using each ancestor rule R of r. we perform a X2 test on r with respect to the data tuples covered R. If the test shows a positive correlation, r is kept. Otherwise, r is pruned IDSL,Intellignet Database System Lab
Rule pruning( cont) IDSL,Intellignet Database System Lab
Identifying non-actionable rules • Definition 4(potentially actionable rule): a rule R is a potentially actionable(PA) rule; • If R does not have any descendent rules • (There exist some descendent PA rules for R) if after removing those data tuples that can be cover by R’s descendent PA rules, R is still significant with respect to “y” • Definition 5(non-actionable rules):a rule R is non-actionable rule if it is not a PA rule IDSL,Intellignet Database System Lab
Identifying non-actionable rules( cont) IDSL,Intellignet Database System Lab
Identifying non-actionable rules( cont) IDSL,Intellignet Database System Lab
Empirical evaluation • Used 30 datasets, 25 are obtained from UCI ML Repository and 5 are from real-life applications • Set the large rules limitation at 80000 in memory • Using the significance level of 95% for X2 test • Set the mini support to 1% IDSL,Intellignet Database System Lab
Empirical evaluation( cont) IDSL,Intellignet Database System Lab
Conclusion • To presented an algorithm to identify all the non-actionable rules from a set of significant rules • Experiment results show more than 34% of significant rules are not actionable IDSL,Intellignet Database System Lab
Opinion • Providing a point of view to evaluate association rule • “practical usage point of view” may be a important issues IDSL,Intellignet Database System Lab