1.23k likes | 1.44k Views
Classifying Categorical Data. Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi. Presentation Outline. Introduction Related Work Contributions ACME Classifier Handling Non-Closed Itemsets Improving the Execution time ACME and Naïve Bayes
E N D
Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi
Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work
Presentation Outline • Introduction • The classification problem • Preliminaries • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work
The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4}
The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4}
The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4} query
Formal Problem Statement • Given a Dataset D • Learn from this dataset to classify a potentially unseen record `q’ [query]to its correct class. • Each record riis explained using boolean attributes I = { i1 , i2 , …, i|I| } and is labeled to one of the classes C = { c1,c2, …, c|C| } • I = { i1 , i2 , …, i|I| } can also be looked at as a set of items.
Preliminaries • itemsetA set of items – { i1 , i2 , i3 } • P(.)Probability Distribution • frq-itemset An itemset whose frequency is above a given threshold σ • σSupport Threshold • τConfidence Threshold • { i1 , i2 } → { i3 } An Association Rule ( AR ) • { i1 , i2 } → c1 A Classification Association Rule ( CAR )
Presentation Outline • Introduction • Related Work • Classification based on Associations (CBA) • Classification based on Multiple Association Rules (CMAR) • Large Bayes (LB) Classifier • Contributions • ACME Classifier • Handling Non-Closed Itemsets • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work
Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence
Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations
Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations
Disadvantages with CBA: Single Rule based classification • Let the classifier have 3 rules : • i1 → c1 support: 0.3, confidence: 0.8 • i2 , i3 → c2 support: 0.7, confidence: 0.7 • i2 , i4 → c2 support: 0.8, confidence: 0.7 • Query { i1 , i2 , i3 , i4 } will be classified to the class c1 by CBA which might be incorrect. • CBA, being a single-rule classifier, cannot consider the effects of multiple-parameters.
Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations
Fully Confident Associations • An Association { i1 , i2 } → { i3 } is fully confident if its confidence is 100%. This means P( ~i3 , i1 , i2 ) = 0. • If CBA includes the CAR { i1 , i2 , i3 } → c1it will also include{i1 , i2 } → c1 • If the query { i1 , i2 , ~i3 } arrives for classification, it is classified to c1 using {i1 , i2 } → c1 • But P (~i3 , i1 , i2 ) = 0 • CBA does not check for all statistical relationships.
Classification based on Multiple ARs (CMAR) • [WenminLi-ICDM01] • Uses multiple CARs in the classification step • Steps in CMAR: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Find all CARs which satisfy the given query • Group them based on their class label • Classify the query to the class whose group of CARs has the maximum weight
Classification based on Multiple ARs (CMAR) • [WenminLi-ICDM01] • Uses multiple CARs in the classification step • Steps in CMAR: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Find all CARs which satisfy the given query • Group them based on their class label • Classify the query to the class whose group of CARs has the maximum weight
CMAR contd. R Rules satisfying query `q’ Output the class with the highest weight
CMAR Disadvantages • No proper statistical explanation given for the mathematical formulae that were employed • Cannot handle Fully Confident Associations
Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability
Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability
LB: Pruning FRQ-Itemsets • An immediate itemset of `s’ without the item `i’ is denoted as `s\i ’ • Ex: s = { i1 , i2 } then s\i1 denotes the set { i2 } • Symbol ` I ‘ stands for interestingness • Pj,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘
LB: Pruning FRQ-Itemsets • An immediate itemset of `s’ without the item `i’ is denoted as `s\i ’ • Ex: s = { i1 , i2 } then s\i1 denotes the set { i2 } • Symbol ` I ‘ stands for interestingness • Pj,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘
LB Pruner: Disadvantages • Assumes an itemset’s interestingness does not depend on items not occurring in it. • Ex: For s = { i1 , i2 }, I( s )is only dependent on { i1 } and { i2 } but not on { i3 } • Assumes an itemset’s interestingness can be calculated from pairs of immediate-subsets • Uses a global information measure for all classes. • itemsets can be informative in one class but not in another.
Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability
LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 }
LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 } • Iteratively select itemsets until all the items in ` q ‘ are covered. • There could be many product approximations.
LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 } • Iteratively select itemsets until all the items in ` q ‘ are covered. • There could be many product approximations. • Heuristic: Select itemset ` s ‘ iteratively s.t. • new items in `s’ are the least. • If there are contenders, pick `s’ with the highest I ( s )
Estimating P(q): Disadvantages • LB calculates probability of q as if its an itemset. • Uses an approximationof P(q) and hence assumes independences between items • There could be a better product approximation. • Cannot handle Fully Confident Associations • if there exists a rule { i1 , i2 } → i3 , i.e. P( i1 , i2 , i3 ) = P ( i1 , i2 ) • q = { i1 , i2 , i4 } ‘s product approximation is built as P ( i1 , i2 )· P ( i4 | i1 , i2 )
Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work
Contributions • Frequent Itemsets + Maximum Entropy = very accurate, robust and theoretically appealing classifier • Fixed the existing Maximum Entropy model to work with frequent itemsets • Made the approach scalable to large databases • Proved that Naïve Bayes is a specialization of ACME
Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Philosophy • Steps involved • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work
ACME Classifier • Main philosophy of the ACME: Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.
ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.
ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.
ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.
Mining Informative Patterns D Records labeled as c1 Records labeled as c2 D1 D2
Mining Informative Patterns Di Apriori Constraints of class ci Confidence based Pruner Non-Redundant Constraints of class ci
Mining constraints • Let Sidenote the set of itemsets which are frequent in ci, i.e.: • Let S denote the set of itemsets which are frequent in atleast one class, i.e.: • Constraints of ci denoted by Ci are:
Pruning constraints • Constraints are pruned based on how well they differentiate between classes. • Ex: s={ i1 , i2 } an itemset in S, is pruned if • a case when it is not pruned
ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.
ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.
Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) Constraints of Class ci(Ci)
Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) How? Constraints of Class ci(Ci)
Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) How? Constraints of Class ci(Ci) Characteristic of Pi : Should satisfy every constraint in Ci
An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) }
An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) }
An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) } choose the distribution with the highest entropy.
Learning step • The final outcome solution distribution Pfor class ci of the Learning Step should: • Satisfy constraints of the class • Have the highest entropy possible
Learning step • The final outcome solution distribution Pfor class ci of the Learning Step should: • Satisfy constraints of the class • Have the highest entropy possible • [Good-AMS63] • P can be modeled as the following log-linear model.