1 / 123

Classifying Categorical Data

This thesis presentation discusses the ACME classifier, handling non-closed itemsets, and improving execution time in machine learning models. It explores the use of Association Rules and Naïve Bayes for classification, along with experimental results and future directions. The presentation outlines the classification problem, attributes, formal problem statement, and key concepts like itemsets, support threshold, and confidence threshold. It also compares Classification Based on Associations (CBA) and Classification Based on Multiple Association Rules (CMAR) methods, highlighting their advantages and disadvantages in handling various data scenarios and associations.

christinak
Download Presentation

Classifying Categorical Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi

  2. Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

  3. Presentation Outline • Introduction • The classification problem • Preliminaries • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

  4. The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4}

  5. The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4}

  6. The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4} query

  7. Formal Problem Statement • Given a Dataset D • Learn from this dataset to classify a potentially unseen record `q’ [query]to its correct class. • Each record riis explained using boolean attributes I = { i1 , i2 , …, i|I| } and is labeled to one of the classes C = { c1,c2, …, c|C| } • I = { i1 , i2 , …, i|I| } can also be looked at as a set of items.

  8. Preliminaries • itemsetA set of items – { i1 , i2 , i3 } • P(.)Probability Distribution • frq-itemset An itemset whose frequency is above a given threshold σ • σSupport Threshold • τConfidence Threshold • { i1 , i2 } → { i3 } An Association Rule ( AR ) • { i1 , i2 } → c1 A Classification Association Rule ( CAR )

  9. Presentation Outline • Introduction • Related Work • Classification based on Associations (CBA) • Classification based on Multiple Association Rules (CMAR) • Large Bayes (LB) Classifier • Contributions • ACME Classifier • Handling Non-Closed Itemsets • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

  10. Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence

  11. Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations

  12. Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations

  13. Disadvantages with CBA: Single Rule based classification • Let the classifier have 3 rules : • i1 → c1 support: 0.3, confidence: 0.8 • i2 , i3 → c2 support: 0.7, confidence: 0.7 • i2 , i4 → c2 support: 0.8, confidence: 0.7 • Query { i1 , i2 , i3 , i4 } will be classified to the class c1 by CBA which might be incorrect. • CBA, being a single-rule classifier, cannot consider the effects of multiple-parameters.

  14. Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations

  15. Fully Confident Associations • An Association { i1 , i2 } → { i3 } is fully confident if its confidence is 100%. This means P( ~i3 , i1 , i2 ) = 0. • If CBA includes the CAR { i1 , i2 , i3 } → c1it will also include{i1 , i2 } → c1 • If the query { i1 , i2 , ~i3 } arrives for classification, it is classified to c1 using {i1 , i2 } → c1 • But P (~i3 , i1 , i2 ) = 0 • CBA does not check for all statistical relationships.

  16. Classification based on Multiple ARs (CMAR) • [WenminLi-ICDM01] • Uses multiple CARs in the classification step • Steps in CMAR: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Find all CARs which satisfy the given query • Group them based on their class label • Classify the query to the class whose group of CARs has the maximum weight

  17. Classification based on Multiple ARs (CMAR) • [WenminLi-ICDM01] • Uses multiple CARs in the classification step • Steps in CMAR: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Find all CARs which satisfy the given query • Group them based on their class label • Classify the query to the class whose group of CARs has the maximum weight

  18. CMAR contd. R Rules satisfying query `q’ Output the class with the highest weight

  19. CMAR Disadvantages • No proper statistical explanation given for the mathematical formulae that were employed • Cannot handle Fully Confident Associations

  20. Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability

  21. Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability

  22. LB: Pruning FRQ-Itemsets • An immediate itemset of `s’ without the item `i’ is denoted as `s\i ’ • Ex: s = { i1 , i2 } then s\i1 denotes the set { i2 } • Symbol ` I ‘ stands for interestingness • Pj,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘

  23. LB: Pruning FRQ-Itemsets • An immediate itemset of `s’ without the item `i’ is denoted as `s\i ’ • Ex: s = { i1 , i2 } then s\i1 denotes the set { i2 } • Symbol ` I ‘ stands for interestingness • Pj,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘

  24. LB Pruner: Disadvantages • Assumes an itemset’s interestingness does not depend on items not occurring in it. • Ex: For s = { i1 , i2 }, I( s )is only dependent on { i1 } and { i2 } but not on { i3 } • Assumes an itemset’s interestingness can be calculated from pairs of immediate-subsets • Uses a global information measure for all classes. • itemsets can be informative in one class but not in another.

  25. Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability

  26. LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 }

  27. LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 } • Iteratively select itemsets until all the items in ` q ‘ are covered. • There could be many product approximations.

  28. LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 } • Iteratively select itemsets until all the items in ` q ‘ are covered. • There could be many product approximations. • Heuristic: Select itemset ` s ‘ iteratively s.t. • new items in `s’ are the least. • If there are contenders, pick `s’ with the highest I ( s )

  29. Estimating P(q): Disadvantages • LB calculates probability of q as if its an itemset. • Uses an approximationof P(q) and hence assumes independences between items • There could be a better product approximation. • Cannot handle Fully Confident Associations • if there exists a rule { i1 , i2 } → i3 , i.e. P( i1 , i2 , i3 ) = P ( i1 , i2 ) • q = { i1 , i2 , i4 } ‘s product approximation is built as P ( i1 , i2 )· P ( i4 | i1 , i2 )

  30. Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

  31. Contributions • Frequent Itemsets + Maximum Entropy = very accurate, robust and theoretically appealing classifier • Fixed the existing Maximum Entropy model to work with frequent itemsets • Made the approach scalable to large databases • Proved that Naïve Bayes is a specialization of ACME

  32. Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Philosophy • Steps involved • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

  33. ACME Classifier • Main philosophy of the ACME: Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

  34. ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

  35. ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

  36. ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

  37. Mining Informative Patterns D Records labeled as c1 Records labeled as c2 D1 D2

  38. Mining Informative Patterns Di Apriori Constraints of class ci Confidence based Pruner Non-Redundant Constraints of class ci

  39. Mining constraints • Let Sidenote the set of itemsets which are frequent in ci, i.e.: • Let S denote the set of itemsets which are frequent in atleast one class, i.e.: • Constraints of ci denoted by Ci are:

  40. Pruning constraints • Constraints are pruned based on how well they differentiate between classes. • Ex: s={ i1 , i2 } an itemset in S, is pruned if • a case when it is not pruned

  41. ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

  42. ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

  43. Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) Constraints of Class ci(Ci)

  44. Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) How? Constraints of Class ci(Ci)

  45. Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) How? Constraints of Class ci(Ci) Characteristic of Pi : Should satisfy every constraint in Ci

  46. An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) }

  47. An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) }

  48. An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) } choose the distribution with the highest entropy.

  49. Learning step • The final outcome solution distribution Pfor class ci of the Learning Step should: • Satisfy constraints of the class • Have the highest entropy possible

  50. Learning step • The final outcome solution distribution Pfor class ci of the Learning Step should: • Satisfy constraints of the class • Have the highest entropy possible • [Good-AMS63] • P can be modeled as the following log-linear model.

More Related