290 likes | 464 Views
Top Down FP-Growth for Association Rule Mining. By Ke Wang. Introduction. Classically, for rule A B : support: computed by count ( AB ) frequent --- if pass minimum support threshold confidence: computed by count ( AB ) / count( A )
E N D
Introduction • Classically, for rule A B : • support: computed by count( AB ) • frequent --- if pass minimum support threshold • confidence: computed by count( AB ) / count(A ) • confident – if pass minimum confidence threshold • How to mine association rules? • find all frequent patterns • generate rules from the frequent patterns
Introduction • Limitations of current research • use uniform minimum support threshold • only use support as pruning measure • Our contribution • improve efficiency • adopt multiple minimum supports • introduce confidence pruning
Related work -- Frequent pattern mining • Apriori algorithm • method: use anti-monotone property of support to do pruning, i.e. • if length k pattern is infrequent, its length k+1 super-pattern can never be frequent • FP-growth algorithm--better than Apriori • method: • build FP-tree to store database • mine FP-tree in bottom-up order
Related work -- Association rule mining • Fast algorithms trying to guarantee completeness of frequent patterns • Parallel algorithms & association rule based query languages • Various association rule mining problems • multi-level multi-dimension rule • constraints on specific item
TD-FP-Growth for frequent pattern mining • Similar tree structure as FP-growth • Compressed tree to store the database • nodes on each path of the tree are globally ordered • Different mining method VS.FP-growth • FP-growth: bottom-up tree mining • TD-FP-Growth : top-down tree mining
root b: 2 a: 3 e: 1 c: 1 b: 1 c: 1 e: 1 c: 1 e: 1 TD-FP-Growth for frequent pattern mining b, e a, b, c, e b, c, e a, c, d a minsup = 2 Construct a FP-tree:
root b: 2 a: 3 e: 1 c: 1 b: 1 c: 1 e: 1 c: 1 (b: 1) (b: 1, c: 1) (a: 1, b: 1, c: 1) e: 1 e’s conditional pattern base TD-FP-Growth for frequent pattern mining b, e a, b, c, e b, c, e a, c, d a minsup = 2 FP-growth: bottom-up mining Mining order: e, c, b, a
FP-growth: bottom-up mining root (b: 1) (b: 1, c: 1) (a: 1, b: 1, c: 1) b: 3 c: 2 • both e’s conditional pattern base and conditional FP-tree are stored in memory • mine e’s conditional FP-tree recursively • conditional pattern bases and FP-trees are built for all other items and their super-patterns TD-FP-Growth for frequent pattern mining drawback!
TD-FP-Growth for frequent pattern mining • TD-FP-Growth : adopt top-down mining strategy • motivation: avoid building extra databases and sub-trees as FP-growth does • method: process nodes on the upper level before those on the lower level • result: any modification happened on the upper level nodes would not affect the lower level nodes See example
root b: 2 a: 3 e: 1 c: 1 b: 1 c: 1 e: 1 c: 1 e: 1 b, e a, b, c, e b, c, e a, c, d a minsup = 2 TD-FP-Growth for frequent pattern mining Mining order: a, b, c, e CT-tree and header table H
b: 1 sub-header-table H_c root a: 2 b: 2 a: 3 e: 1 c: 1 b: 1 c: 1 e: 1 c: 1 CT-tree and header table H e: 1 b, e a, b, c, e b, c, e a, c, d a minsup = 2 CT-tree for frequent pattern mining
CT-tree for frequent pattern mining • Completeness • for entry i in H, we mine all the frequent patterns that end up with item i, no more and no less • Complete set of frequent patterns: {a } {b } {c }, {b, c }, {a, c } {e }, {b, e }, {c, e }, {b, c, e }
TD-FP-Growth for frequent pattern mining • Comparing to FP-growth, TD-FP-Growth is: • Space saving: • only one tree and a few header tables • no extra databases and sub-trees • Time saving: • does not build extra databases and sub-trees • walk up path only once to update count information for nodes on the tree and build sub-header-tables.
TD-FP-Growth for association rule mining • Assumptions: • There is a class-attribute in the database • Items in the class-attribute called class-items, others are non-class-items • Each transaction is associated a class-item • Only class-item appears in the right-hand of the rule example rule: a, b Ci
TD-FP-Growth for association rule mining--multi mini support • Why? • Use uniform minimum support, computation of count considers only number of appearance • Uniform minimum support is unfair to items that appears less but worth more. • Eg. responder vs. non-responder • How? • Use different support threshold for different class
TD-FP-Growth for association rule mining -- multi mini support • multiple VS. uniform • C1 : 4, C 2 : 2 • rules with relative minsup = 50% proportional to each class -- multiplier in performance • uniform minimum support: absolute minsup= 1; • 11 nodes tree, 23 rules • multiple minimum supports: absolute minsup1 = 2; absolute minsup2 = 1; • 7 nodes tree, 9 rules • more effective and space-saving • time-saving --- show in performance c, f, C1 b, e, C2 b, e, f, C1 a, c, f, C1 c, e, C2 b, c, d, C1
TD-FP-Growth for association rule mining--conf pruning • Motivation • make use of the other constraint of association rule: confidence, to speed up mining • Method • confidence is not anti-monotone • introduce: acting constraint of confidence, which is anti-monotone • push it inside the mining process
TD-FP-Growth for association rule mining--conf pruning conf(A B) = count(AB) / count(A) >= minconf count(AB) >= count(A) * minconf count(AB) >= minsup * minconf (anti-monotone & weaker) --- the acting constraint of confidence for the original confidence constraint of rule A B • support of rule is computed by: count(A) • count(AB): class-count of itemset A related to class B
root b: 2 … … e: 2 Header table H: count(i) = count(i, C1) + count(i, C2) … If no confidence pruning sub-header-table H_e TD-FP-Growth for association rule mining--conf pruning c, f, C1 b, e, C2 b, e, f, C1 a, c, f, C1 a, c, d, C2 minsup = 2 minconf= 60% count(e) >= minsup; However, both count(e, C1) &count(e, C2) < minsup * minconf; terminate mining for e!
Performance • Choose several data sets from UC_Irvine Machine Learning Database Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html.
Performance — mine rules with multiple minimum supports FP-growth is only for frequent pattern mining relative minsup, proportional to each class
Conclusions and future work • Conclusions of TD-FP-Growth algorithm • more efficient in finding both frequent patterns and association rules • more effective in mining rules by using multiple minimum supports • Introduce a new pruning method: confidence pruning, and push it inside the mining process; thus further speed up mining
Conclusions and future work • Future work • Explore other constraint-based association rule mining method • Mine association rules with item concept hierarchy • Apply TD-FP-Growth to applications based on association rule mining • Clustering • Classification
Reference • (1)R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. Proc. 1993 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’93), pages 207-216, Washington, D.C., May 1993. • (2)U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.). Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996. • (3)H. Toivonen. Sampling large databases for association rules. Proc. 1996 Int. Conf. Very Large Data Bases (VLDB’96), pages 134-145, Bombay, India, September 1996. • (4)R. Agrawal and S. Srikant. Mining sequential patterns. Proc. 1995 Int. Conf. Data Engineering (ICDE’95), pages 3-14, Taipei, Taiwan, March 1995. • (5)J. Han, J. Pei and Y. Yin. Mining Frequent Patterns without Candidate Generation. Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’00), pages 1-12, Dallas, TX, May 2000. • (6) J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. Proc. 2001 ACM-SIGMOD Int. Conf., Santa Barbara, CA, May 2001. And more!