190 likes | 364 Views
Department of Information & Computer Education, NTNU. A Parameterised Algorithm for Mining Association Rules. Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 2001 , pp. 45-51. Advisor : Jia-Ling Koh
E N D
Department of Information & Computer Education, NTNU A Parameterised Algorithm for Mining Association Rules Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 2001, pp. 45-51. Advisor:Jia-Ling Koh Speaker:Chen-Yi Lin
Department of Information & Computer Education, NTNU Outline • Introduction • Problem Definition • Finding Frequent Itemsets • Experimental Results • Conclusion
Department of Information & Computer Education, NTNU Introduction (1/2) • Majority of the algorithms finding frequent itemsets counts one category of itemsets, e.g. Apriori algorithm. • The quality of association rule mining algorithms is determined: • the number of passes through an input dataset • the number of candidate itemsets
Department of Information & Computer Education, NTNU Introduction (2/2) • One of the objectives is to construct an algorithm that makes a good guess. • the parameterised (n, p) algorithm finds all frequent itemsets from a range of n levels in itemset lattice in p passes (n>=p) through an input data set.
Department of Information & Computer Education, NTNU Problem Definition • Positive candidate itemset • It is assumed (guessed) to be frequent. • Negative candidate itemset • It is assumed (guessed) to be not frequent. • Remaining candidate itemset • candidates verified in another scan.
Department of Information & Computer Education, NTNU Finding Frequent Itemsets (Guessing Candidate Itemsets) Statistics table T Initial DB scan scan
Department of Information & Computer Education, NTNU apriori_gen Statistics table T Item frequency threshold = 80% m-element transaction threshold = 5 Number of levels to traverse (n) = 3 Number of passes through an input data set (p) = 2 3-element transactions: 5*80%=4 {B} 4-element transactions: 2*80%=2 {ABC} 5-element transactions: 3*80%=3 {BCEF}
Department of Information & Computer Education, NTNU apriori_gen apriori_gen pruning all subsets of positive superset
Department of Information & Computer Education, NTNU scan DB (1) generate remaining candidate itemsets Finding Frequent Itemsets (Verification of Candidate Itemsets) Minimum support=20%
Department of Information & Computer Education, NTNU apriori_gen scan DB scan DB (2)
Department of Information & Computer Education, NTNU Finding Frequent Itemsets
Department of Information & Computer Education, NTNU Experimental Results (1/6) • Parameters: • ntrans-number of transactions in a database • tl-average transaction length • np-number of patterns • sup-minimum support
Department of Information & Computer Education, NTNU Experimental Results (2/6) A comparison of no. database scans between Apriori and (n, p) algorithm
Department of Information & Computer Education, NTNU Experimental Results (3/6) Performance of Apriori and (n, p) with tl=10 np=10 sup=20%
Department of Information & Computer Education, NTNU Experimental Results (4/6) Performance of Apriori and (n, p) algorithm with tl=14 np=10 sup=20% Performance of Apriori and (n, p) algorithm with tl=20 np=100 sup=10%
Department of Information & Computer Education, NTNU Experimental Results (5/6) A performance of (n,3) with increasing ratio of (n/p)
Department of Information & Computer Education, NTNU Experimental Results (6/6) A performance of (8,p) with increasing parameter p
Department of Information & Computer Education, NTNU Conclusion • The important contribution is the reduction of number scans through a data set.