240 likes | 262 Views
Anomalous Association Rules. Máster Oficial en Soft Computing y Sistemas Inteligentes Universidad de Granada. Introduction. Association Rule: X Y Supp(X Y) ≡ Supp(X Y) ≥ ε (5%) Conf(X Y) = ≥ θ (80%). frequent. confident.
E N D
Anomalous Association Rules Máster Oficial en Soft Computing y Sistemas Inteligentes Universidad de Granada
Introduction Association Rule: X Y Supp(X Y) ≡ Supp(X Y) ≥ ε (5%) Conf(X Y) = ≥ θ (80%) frequent confident Find all the frequent and confident associations Applications Market basket, CRM, etc.
Introduction Problem: Thousands of rules are found. Unmanageable for any user! There are too many spurious associations. Possible solutions: • Subjective measures • Objective measures The main problem is the type of knowledge an association rule represents
Introduction The crucial problem is to determine which kind of events we are interested in, so that we can appropriately characterize them. It is often more interesting to find surprising non-frequent events than frequent ones. The type of interesting events is application dependent
Introduction • Infrequent itemsets in intrusion detection systems • Exceptions to associations for the detection of conflicting medicine therapies • Unsual short sequences of Nucleotides in genome sequencing • Etc.
Introduction Our Objective To introduce the concept of anomalous association rule as a confident rule representing homogeneous deviations from common behavior.
Related Work Suzuki, Hussain & Suzuki: “Exception Rules” X Y is an association rule X I ¬ Y is the exception rule I is the “Interacting” itemset X I is the reference rule Too many exceptions
Our Definition X usually implies Y (dominant rule) X Y frequent and confident When X does not imply Y, then it usually implies A (the Anomaly) X ¬Y A Anomalous association rule confident X Y ¬A confident
Our Definition X Y is the dominant rule
Our Definition X A when ¬ Y is the anomalous rule
Our Definition some overlapping cases may appear
Our Definition If symptons-X then disease-Y If symptons-X then disease-A when not disease-Y disease-A does not occur at the same time of symptons-X and disease-Y
Algorithm Based on TBAR “Tree based association rules” Data & Knowledge Engineering (2001) Berzal, Cubero, Marín, Serrano
A #7 B #9 C #7 D #8 D #5 B #6 C #6 D #7 D #5 D #5 D #5 Algorithm (assoc. rules) Possible Items:A, B, C, D, E, F L1 7 instances wih A 6 inst. withAB L2 5 inst. withAD 6 inst. withBC 5 inst. withABD L3
A#7 AB#6 AC#4 AD#5 AE#3 AF#3 B #9 C #7 D #8 A#7 B #6 D #5 A#7 A* Non frequent Algorithm (anomalous rules) Possible Items:A, B, C, D, E, F First scan Second scan
B #9 D #5 C #7 D #8 A#7 B #6 A#7 A* B #9B* C #7C* D #8D* C #6 D #7 D #5 Algorithm (anomalous rules) Possible Items:A, B, C, D, E, F First scan Second scan Candidate generation
Algorithm (anomalous rules) Rule generation: Inmediate from the frequent items
Experimentation El “Núcleo” de X Y|A es Y|A
Usual consequent “Anomaly” Experimentation X Y if X then A when not Y X ¬Y A
Experimentation Nursery: if NURSERY:very_crit and HEALTH:priority then CLASS:priority (9 out of 9) when not CLASS:spec_prior “Anomaly” Usual consequent
Experimentation Census: “Anomaly” if WORKCLASS: Local-gov then CAPGAIN: [99999.0 , 99999.0] (7 out of 7) when not CAPGAIN: [0.0 , 20051.0] Usual consequent
Conclusions We have introduced an alternative type of interesting knowledge: anomalous association rules We have given an efficient algorithm to detect all the anomalies
Conclusions Future Work: To complete experimentation To filter the anomalies, eliminating redundant rules To introduce measures of interest for the anomalies, allowing their ordering