250 likes | 374 Views
Elena Baralis , Silvia Chiusano , Paolo Garza Dipartimento di Automatica e Informatica, Politecnico di Torino, ITALY . A Lazy Approach to Associative Classification. IEEE Transactions on Knowledge and Data Engineering ’08 Feb. Outline. Introduction Compact Rule Set Representation
E N D
Elena Baralis, Silvia Chiusano, Paolo Garza Dipartimento di Automatica e Informatica, Politecnico di Torino, ITALY A Lazy Approach to Associative Classification IEEE Transactions on Knowledge and Data Engineering ’08 Feb.
Outline • Introduction • Compact Rule Set Representation • Classification model generation • L3Gen Mining Algorithm • Lazy Pruning (L3) • Experiments • Conclusion
Introduction • Excessive rule sets size • A huge number of classification rules • Over-pruning • Discarding useful and low-quality rules • The Goal • To minimize the size of the high-quality rules
Compact Rule Set Representation • Compact form of a classification rule set • Concise and complete representation • Regenerated from the compact form Class Generator Closed itemset
Compact Rule Set Representation • Correlated items (Macroitem) • The items are contained in the same transactions
Compact Rule Set Representation • Closed itemsets • The maximal set of items common to a set of transactions • γ (X) = X {a, b, c, d}
Compact Rule Set Representation • Generator itemsets • G is a generator itemset of a closed itemset X • γ (G) = X, ∃G ∈ {G1, G2, …, Gn} Generators: {a,b}, {a,c}, {a}, {b}, {c}, {d}, {a,b}, {a,c}, {a,d}, {b,c}, {b,d}, {c,d}, {a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}, {a,b,c,d} Closed itemsets: Support >= 16.67% Confidence >= 50%
Classification model generation • Classification rule extraction • Using compact representation to perform rule extraction with low support thresholds • Classification rule pruning • Providing high-quality rules for classification
L3Gen Mining Algorithm • Recursive projection of macrodata set • Macroitem of minimum supportis considered first • : the set of macroitems
L3Gen Mining Algorithm • Set updating • The macroitems that included in all transactions are removed from and add to
L3Gen Mining Algorithm • Compact rule mining • Compact rules satisfy the support, confidence • Each rule is labeled by a class label
L3Gen Mining Algorithm • Data set project • The used macroitems is removed from and add to
Lazy Pruning • The three subsets of rules • Used rules • Correctly classified at least one training data instance • Spare rules • That have not been used during the training phase • May become useful to classify data • Harmful rules • That only wrongly classify training data • Pruning target
Lazy Pruning • Rule rank • confidence(ri) > confidence (rj) • support(ri) > support (rj) • length(ri) > length(rj) • The number of items • lex(ri) > lex(rj) • The position of r in the lexicographic order on items
Lazy Pruning (L3) Training data set • Closed itemsets of Compact rules Rules
Lazy Pruning (L3) Training data set Rules
Lazy Pruning (L3) Training data set Rules
Lazy Pruning (L3) Training data set Rules
Lazy Pruning (L3) Training data set Rules Harmful rules
Lazy Pruning (L3) Training data set Rules
Lazy Pruning (L3) Training data set Rules
Lazy Pruning (L3) • Lazy Pruning (L3) Rules Level-1 Used rules Level-2 Spare rules + Level-1 & Level-2 Compact rule itemsets
Experiments • L3 versus the Other Classifiers • higher than/equal to/lower than
Conclusion • The compact form • It allows representing very large rule sets • The lazy pruning technique discards only harmful rules • Two Level classification • Level-1 includes few high-quality rules • Level-2 provides Level-1 did not classify rules