160 likes | 350 Views
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining. Motivation. Previous work Association r ule m ining Run time used to compare mining algorithms Lack of accuracy-based comparisons Associative classification: Focus on accurate classifiers.
E N D
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Motivation • Previous work • Association rule mining • Run time used to compare mining algorithms • Lack of accuracy-based comparisons • Associative classification: • Focus on accurate classifiers Side effect: Comparison of a standard associative classifier to standard techniques Idea: • Think backwards • Using the resulting classifiers as basis for comparisons of confidence-based rule miners Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Overview • Motivation • Basics: • Definitions • Associative classification • (Class) Association Rule Mining • Apriori vs. predictive Apriori (by Scheffer) • Pruning • Classification • Quality measures and Experiments • Results • Conclusions evaluate the sort order of rules using properties of associative classifiers Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Basics: Definitions • A table over n attributes (item attribute-value pair) • Class association rule: implication where class attribute X body of rule, Y head of the rule • Confidence of a (class) association rule: (support s(X): the number of database records that satisfy X ) Relative frequency of a correct prediction in the (training) table of instances. Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Basics: Mining - Apriori rules sorted according to: confidence generates all (class) association rules with support and confidence larger than predefined values. • Mines all item sets above minimum support (frequent item sets) • Divide frequent item sets in rule body and head. Check if confidence of the rule is above minimum confidence Adaptations to mine class association rules as described by Liu et al (CBA): • divide training set into classes; one for each class • mine frequent item set separately in each subset • take frequent item set as body and class label as head Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
rules sorted according to: expected predicted accuracy Basics: Mining – predictive Apriori • Predictive accuracy of a rule r: support based correction of the confidence value • Inherent pruning strategy: • Output the best n rules according to: • Expected pred. accuracy among n best • Rule not subsumed by a rule with at least the same expected pred. accuracy prefers more general rules • Adaptations to mine class association rules: • Generate frequent item sets from all data (class attribute deleted) as rule body • Generate rule for each class label Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Basics: Pruning • Number of rules too big for direct use in a classifier • Simple strategy: • Bound the number of rules • Sort order of mining algorithm remains • CBA: Optional pruning step: pessimistic error-rate-based pruning: • A rule is pruned if removing a single item from a rule results in a reduction of the pessimistic error rate • CBA: Obligatory pruning: database coverage method: • Rule that classifies at least one instance correctly (is highest ranked) belongs to intermediate classifier • Delete all covered instances • Take intermediate classifier with lowest number of errors Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Overview • Motivation • Basics: • Definitions • Associative classification • (Class) Association Rule Mining • Apriori vs. predictive Apriori (by Scheffer) • Pruning • Classification • Quality measures • Results • Conclusions Think backwards: Use the properties of different classifiers to obtain accuracy-based measures for a set of (class) association rules Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Classification • Input: • Pruned, sorted list of class association rules • Two different approaches • Weighted vote algorithm • Majority vote • Inversely weighted • Decision list classifier, e.g. CBA • Use first rule that covers test instance for classification Think backwards: Mining algorithm preferable if resulting classifier is more accurate, compact, and built in an efficient way Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Quality measures and Experiments Measures to evaluate confidence-based mining algorithms: • Accuracy on a test set (2 slides) • Average rank of the first rule that covers and correctly predicts a test instance • Number of mined rules and number of rules after pruning • Time required for mining and for pruning Comparative study for Apriori and pred. Apriori: 12 UCI datasets: balance,breast-w, ecoli, glass, heart-h, iris, labor led7, lenses, pima, tic-tac-toe,wine One 10 fold cross validation Discretisation Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
1a. Accuracy and Ranking • Inversely weighted • Emphasises top ranked rules • Shows importance of good rule ranking Mining algorithm preferableif resulting classifier is more accurate. Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
1b. How many rules are necessary to be accurate? Majority vote classifier Mining algorithm preferableif resulting classifier is more compact. Similar results forCBA Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Comparison: CBA to standard techniques Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
Conclusions • Use classification to evaluate the quality of confidence-based association rule miners • Test evaluation: • Pred. Apriori mines a higher quality set of rules • Pred. Apriori needs fewer rules • But: pred. Apriori is slower than Apriori • Comparison of standard associative classifier (CBA) to standard ML techniques: • CBA comparable accuracy to standard techniques • CBA mines more rules and is slower • All algorithms are implemented in WEKA or an add-on to WEKA available from http://www.cs.waikato.ac.nz/~ml Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
The End Thank you for your attention. Questions... Contact: stefan_mutter@directbox.com mhall@cs.waikato.ac.nz eibe@cs.waikato.ac.nz Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining