120 likes | 274 Views
CBW: An Efficient Algorithm for Frequent Itemset Mining. Ja-Hwung Su, Wen-Yang Lin System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference . Outline. Introduction CBW algorithm Experiment result Conclusion . Introduction .
E N D
CBW: An Efficient Algorithm for Frequent Itemset Mining Ja-Hwung Su, Wen-Yang Lin System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference
Outline • Introduction • CBW algorithm • Experiment result • Conclusion
Introduction • Mining association rules from a large database of business data has been a hot topic. • When the minimum support threshold decreases, the number of candidate itemsets exponentially increases. • The paper propose a new algorithm that maintains its performance even at relative low supports.
Algorithm • Cut-Both-Ways( CBW )employs a bi-directional search strategy and hybridizes various techniques in frequent itemset generation. • Step1. Pursue an appropriate cutting level α to divide the space into two different parts. • Step2. After identifying all frequent itemset at this level, we perform a downward search to enumerate all frequent itemsets below the cutting level α and determine their support values. • Step3. Upward search to enumerate all frequent itemsets with cardinalities larger than α .
Input : The transaction database D and minimum support minsup ; Output : The set of frequent itemsets F ; 1.scan D to generate all frequent 1-itemsets F1 ; 2.Trans ( D, T, F1, F2, α ) ; 3.Dwnsearch ( D, DF, Fα, α, minsup ) ; 4.Upsearch ( T, UF, Fα, α, minsup ) ; 5.return F = DF ∪ UF ; Concept illustration of CBW
Cutting level α • Problem : • If it is too low, unnecessary intersections will happen frequently during upward searching. • If it is too high, the downward search will spend much more time in itemsets enumeration and counting their supports.
Cutting level α(cont.) • Solution : • INT[r] : the nearest integer of r, for r >=1. • ti⊥ minsup : the set of items in ti with support larger than minsup. • ti⊥ minsup = {x|x ti and sup(x) >= minsup }
Assume that minsup = 40%. The frequent 1-itemsets include {A}, {B}, {C}, and {D}. The cutting level α is (3+2+1+4+3+2+3+3+3+3)/10 Example
C2 = {{A,B}, {A,C}, {A,D}, {B,C}, {C,D}} Since item E is not frequent , there is no need to create the tidlist of E. Tids of t2, t3, and t6 are not included because their cardinalities are less than 3. The resulting 3-itemsets is {{B, C, D}} Example (cont.)
Conclusion • The paper employs a clever guess on the most promising itemset level ( cutting-level) to generate all frequent itemsets located there.