210 likes | 361 Views
Rapid Association Rule Mining. Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001 Adviser:Jia-Ling Koh Speaker: Yu-ting Kung. Introduction.
E N D
Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001 Adviser:Jia-Ling Koh Speaker: Yu-ting Kung
Introduction • Propose an innovative algorithm to push the speed-up barrier ─Rapid Association Rule Mining (RARM) • RARM uses a tree structure─Support-Ordered Trie Itemset (SOTrieIT) • Hold pre-processed transactional data quickly discover large 1-itemsets and 2-itemsets without scanning the database and without candidate 2-itemsets generation
A Complete TrieIT • Definition • I (the set of items) = {a1,a2,…aN} ─lexicographically-ordered • A complete TrieIT is a set of tree nodes such that every tree node w is a 2-tuple <wi,wc> • wi I is the label of the node • wc is a support count
Complete TrieIT W4 (item D) Complete TrieIT W1 (item A) Complete TrieIT W3 (item C) Complete TrieIT W2 (item B) A Complete TrieIT(Cont.) • Example ※Database D is stored as a set of complete TrieITs
Transaction database with N=4 After the transactions 100 to 300 have been inserted into the tree After the transactions 400 have been inserted into the tree A Complete TrieIT(Cont.) • Insertion
Support-Ordered Trie Itemset • Definition • A SOTrieIT is a complete TrieIT with a depth of 1; i.e.,it consists only of • A root node wi • Some child nodes. • All nodes in the forest of SOTrieIT are sorted according to their support counts in descending order from the left
A SOTrieIT(Item B) A SOTrieIT(Item C) A SOTrieIT(Item A) SOTrieIT(Cont.) • Example
Insert TID 100 Insert TID 200 Insert TID 300 Insert TID 400 Transaction database with N=4 SOTrieIT(Cont.) • Insertion Resultant SOTrieIT
Algorithm RARM • Pre-processing • Mining of large itemsets
The sequence with which the SOTrieIT is traversed Algorithm RARM(Cont.) • Example (support threshold is 80%) total number of traversals is 3 and L1={{C}}
Performance Evaluation • Definition of Parameters • Experiment using two database • D1: T25.I10.N1K.D10K • D2: T25.I10.N10K.D100K
Performance Evaluation(Cont.) • Comparison of Apriori and RARM─ execution time • For D1:
Performance Evaluation(Cont.) • For D2:
Performance Evaluation(Cont.) • Why does RARM achieve a much greater speed-up in D2 than in D1?
Conclusion • Experiments have shown that RARM outperforms Apriori at all support thresholds