90 likes | 223 Views
Classification in Complex Systems. Why we should look at the paper: CAEP: Classification by Aggregating Emerging Patterns G. Dong, X. Zhang, L. Wong, and J Li. What are Common Problems in Classification?. Many variables Graphs that relate tuples Protein-protein interactions (KDD-cup 02)
E N D
Classification in Complex Systems Why we should look at the paper: CAEP: Classification by Aggregating Emerging Patterns G. Dong, X. Zhang, L. Wong, and J Li
What are Common Problems in Classification? • Many variables • Graphs that relate tuples • Protein-protein interactions (KDD-cup 02) • Citations (KDD-cup 03) • Anything that violates standard table format
Many Variables Solution: • Naïve Bayes way of multiplying probabilities • Other additive models Problems: • Many factors • May be correlated • Noise … but it gets worse
Graphs • 2 kinds of attributes • Attributes within nodes • Attributes of neighbor and more distant nodes • How do neighbor attributes count? • Take disjunction? • “At least one neighbor that has a particular property” • Probably preferable: • Use links or, more general, paths as basis • Integration into classification???
Idea • Get away from strict set of n attributes • If an attribute or combination of attributes is “interesting” use them • Combining rules? • I would have guessed as in Naïve Bayes • CAEP adds probabilities!?
What is “interesting” • CAEP paper claims “growth rate” • Support of a rule increases significantly from one class label to another • Note: Only increase, not decrease! • What does that mean? • For pattern e and classes P and N • growth_ratePN (e) = suppN (e) / suppP (e)
2 Things Worth Investigating • Is “interestingness” measure related to information gain? • Under certain assumptions: Yes • Can the “score” be justified? • Sum of P(C)!?
Other Issues • Normalization • Emerging patterns only consider increase in support => different number of relevant patterns • How to mine for EPs
Conclusions • Idea very valuable • Classification split into ARM-step and rule combination • Justification of details? • Not great • Should be possible to do it right – with poorer accuracy ;-)