370 likes | 644 Views
A Belief-Driven Method for Discovering Unexpected Patterns. Introduction Zhang Yi Algorithms Lee Wai Choy, Julian Application and Conclusions Wee Jee Jeng. Outline of presentation (Part 1). Background information Structure of rule or belief Assumption of belief
E N D
A Belief-Driven Method for Discovering Unexpected Patterns Introduction Zhang Yi Algorithms Lee Wai Choy, Julian Application and Conclusions Wee Jee Jeng
Outline of presentation (Part 1) • Background information • Structure of rule or belief • Assumption of belief • Unexpectedness of rule • Terms for the algorithm http://web.singnet.com.sg/~waichoy/cs6203/
Background • Most of the research work in the KDD (Knowledge Discovery in Database) field focuses on the validity aspect. • Drawbacks: Many existing tools generate a large number of valid but obvious or irrelevant patterns • To address this issue, some researchers have studied the discovery of novel and useful patterns http://web.singnet.com.sg/~waichoy/cs6203/
Background (cont.) • This paper focuses on discovering the unexpected patterns relative to a belief system. • Belief (a logical statement) is the prior knowledge contains a set of expectations about the problem domain. • The belief can be generated by elicitation of beliefs for the domain expert, learning them from data, and refinement of existing beliefs using newly discovered patterns, etc.(Not the concern of this paper) http://web.singnet.com.sg/~waichoy/cs6203/
Overview of unexpectedness 1. Defined in probabilistic terms(Silberschatz and Tuzhilin 1996a) • A rule is considered to be “interesting” if it affects the degrees of beliefs. 2. Based on a syntactic comparison between a rule and a belief In (Liu and Hsu 1996) • A rule and a belief are “different” when the consequents of the rule and the belief are “similar” but the antecedents are “far apart” or vice versa http://web.singnet.com.sg/~waichoy/cs6203/
Overview of unexpectedness (cont.) 3. Logical contradiction • This paper uses a new definition of unexpectedness in terms of a logical contradiction of a rule and a belief • The method uses these beliefs to seed the search for patterns in data that contradict the beliefs. http://web.singnet.com.sg/~waichoy/cs6203/
Structure of a rule or belief • Rules and beliefs are in the form: body -> head • Body : conjunction of (attribute op value) • Head : single (attribute op value) • Op (, , =) http://web.singnet.com.sg/~waichoy/cs6203/
Structure of a rule or belief (cont.) • Sample of a rule: • Education level >= Degree and • Work experience >= 1 year • -> Salary >= 3000 http://web.singnet.com.sg/~waichoy/cs6203/
Assumption : • If a belief Y -> B that we expect to hold on a dataset D, • then the belief will also be expected to hold on any “statistically large” subset of D http://web.singnet.com.sg/~waichoy/cs6203/
Definition of unexpectedness used in this paper • The rule A->B is unexpected with respect to the belief X->Y on the dataset D if: • B and Y = False (Logically contradict) • A and X holds on a statistically large subset of tuples in D • Rule A, X -> B holds http://web.singnet.com.sg/~waichoy/cs6203/
Subset contains A and X Subset contains A(Body of rule) Subset contains X(body of belief) Subset contains A, X and B(¬Y) Figure 1 Belief X->Y Dataset D http://web.singnet.com.sg/~waichoy/cs6203/
Term “CONTR(Y)”(Contradict condition) 1) If the head of the belief is of the form “a val”: a) Any condition of the form “avp” CONTR(Y) if vp {v1, v2,...vk} And vp < val; http://web.singnet.com.sg/~waichoy/cs6203/
Term “CONTR(Y)” (cont.) b) Any condition of the form “a = vp” CONTR(Y) if vp {v1, v2,...vk} and vp < val; An example: Month 10 is contradicted by month x, x {1,2,...,9} and Month = x, x {1,…,9} http://web.singnet.com.sg/~waichoy/cs6203/
Term “CONTR(Y)” (cont.) 2) a val • “a vp” CONTR(Y) if vp {v1, v2,...vk} and vp > val • “a = vp” CONTR(Y) if vp {v1, v2,...vk} And vp > val; http://web.singnet.com.sg/~waichoy/cs6203/
Term “CONTR(Y)” (cont.) 3) “ a = vp” • If a is an ordered attribute, “a vp” CONTR(Y) if vp {v1, v2,...vk} and vp > val • “a vp” CONTR(Y) if vp {v1, v2,...vk} and vp < val • “a = vp” CONTR(Y) if vp {v1, v2,...vk} and vp <> val http://web.singnet.com.sg/~waichoy/cs6203/
Confidence of the rule • confidence of the rule X, P -> C = support(X,P,C)/support(X,P) • Two sets of candidate itemsets for support determination: • Ck and Ck’ http://web.singnet.com.sg/~waichoy/cs6203/
Confidence of the rule (cont.) The form of Itemset Ck: {X, P, C} (i) the body X of the belief, (ii) contradict condition of the head of belief (c CONTR(Y)) and (iii) k other conditions (i.e. P is a conjunction of k conditions) http://web.singnet.com.sg/~waichoy/cs6203/
Confidence of the rule (cont.) The form of Itemset Ck’: {X, P} • Each itemset in Ck’ (I.e. {X,P}) is generated from an itemset in Ck by dropping a contradictory condition, C http://web.singnet.com.sg/~waichoy/cs6203/
ZoomUR • ZoominUR discovers are “refinements” to the beliefs • e.g. the beliefs are contradicted • ZoomoutUR discovers are more general rules that satisfy the conditions of unexpectedness • In general, ZoomUR discovers all non-trivial unexpected rules with respect to a belief :- X Y http://web.singnet.com.sg/~waichoy/cs6203/
ZoominUR Overview • Generate a set of unexpected rules from the set of initial beliefs. • e.g. • Subscribers with monthly income more than $5000tend tosubscribe to more than 3 magazines. • Senior subscribers tend to subscribes to Health related magazines. http://web.singnet.com.sg/~waichoy/cs6203/
ZoominUR Algorithm A set of beliefs 1. forall beliefs B Bel_Set 2. { C0 = { {x, body(B) } | x CONTR(head(B)) }; C0’ = { { body(B) } }; k =0; 3. while ( Ck != ) do 4. { forall candidates c Ck Ck’, compute support(c); 5. Lk = {x | x Ck Ck’, support(x) min_support }; 6. k++; 7. Ck = generate_new_candidates(Lk-1, B); 8. Ck’ = generate_bodies(Ck , B); 9. } 10. Let X = { x | x Li , x a, a CONTR(head(B)) } 11. Items_In_UnexpRuleB = 12. forall (x X) 13. { forall (a x CONTR(head(B))) 14. { rule_conf = support(x) / support(x-a) 15. if (rule “ x – a a ” is not trival) and (rule_conf > min_conf) 16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB {x}; 17. Output Rule “ x – a a “; 18. } 19. } 20. } 21. } Dataset Expected min_support Expected min_conf http://web.singnet.com.sg/~waichoy/cs6203/
Step 7 & 8: Generate new candidates, CK using • LK-1 wrt B. • L0 = {{ sal >= 5000, noOfMag < 3}, • { sal >= 5000 }} Then C1 = {{ sal >= 5000, noOfMag < 3, payment = 1}, { sal >= 5000, noOfMag < 3, payment = 2}} C1’ = {{ sal >= 5000, payment = 1}, { sal >= 5000, payment = 2}} ZoominUR Algorithm (cont.) 1. forall beliefs B Bel_Set 2. { C0 = { {x, body(B) } | x CONTR(head(B)) }; C0’ = { { body(B) } }; k =0; 3. while ( Ck != ) do 4. { forall candidates c Ck Ck’, compute support(c); 5. Lk = {x | x Ck Ck’, support(x) min_support }; 6. k++; 7. Ck = generate_new_candidates(Lk-1, B); 8. Ck’ = generate_bodies(Ck , B); 9. } 10. Let X = { x | x Li , x a, a CONTR(head(B)) } 11. Items_In_UnexpRuleB = 12. forall (x X) 13. { forall (a x CONTR(head(B))) 14. { rule_conf = support(x) / support(x-a) 15. if (rule “ x – a a ” is not trival) and (rule_conf > min_conf) 16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB {x}; 17. Output Rule “ x – a a “; 18. } 19. } 20. } 21. } Convert each belief into the form x y e.g. subscriber’s salary >= $5000 tend to subscribe more than 3 magazines Sal >= 5000 noOfMag >= 3 Then, C0 = {{ sal >= 5000, noOfMag < 3 }} C0’ = { sal >= 5000 } • Step 4: Compute the support using the dataset • Step 5: Generate the large itemset, LK • (if the min_support is satisfied). • L0 = {{ sal >= 5000, noOfMag < 3}, • { sal >= 5000 }} http://web.singnet.com.sg/~waichoy/cs6203/
ZoominUR Algorithm (cont.) Step 10-20: Generates the unexpected rules X of the form, x, p a Step 12-20: Repeat for all x in X. Step 3-9: Repeat until CK becomes a null set. Repeated for each belief in the belief set, Bel_Set. 1. forall beliefs B Bel_Set 2. { C0 = { {x, body(B) } | x CONTR(head(B)) }; C0’ = { { body(B) } }; k =0; 3. while ( Ck != ) do 4. { forall candidates c Ck Ck’, compute support(c); 5. Lk = {x | x Ck Ck’, support(x) min_support }; 6. k++; 7. Ck = generate_new_candidates(Lk-1, B); 8. Ck’ = generate_bodies(Ck , B); 9. } 10. Let X = { x | x Li , x a, a CONTR(head(B)) } 11. Items_In_UnexpRuleB = 12. forall (x X) 13. { forall (a x CONTR(head(B))) 14. { rule_conf = support(x) / support(x-a) 15. if (rule “ x – a a ” is not trival) and (rule_conf > min_conf) 16. { Items_In_UnexpRuleB = Items_In_UnexpRuleB {x}; 17. Output Rule “ x – a a “; 18. } 19. } 20. } 21. } Step 15: Ensure that rule is i) NOT trivial, and ii) Satisfy the min_conf provided by user Step 14: Compute the confidence value of the rule. http://web.singnet.com.sg/~waichoy/cs6203/
Step 3-6: For each unexpected rules, generate more general association rules. Step 8-11: Iteratively check if new rules satisfy the minimum confidence required. ZoomoutUR Algorithm The set of unexpected rules generated from ZoominUR. 1. forall beliefs B 2. { new_candidates = ; 3. forall (x Items_In_UnexpRulesB ) 4. { Let K = { ( k, k’) | k x , k x – body(B), k’ = k – a, a CONTR(head(B)) } 5. new_candidates = new_candidates K; 6. } 7. find_support(new_candidates); 8. foreach (k, k’) new_candidates 9. { consider rule: k’ k-k’ with confidence = support(k) / support(k’); 10. if (confidence > min_conf) Output Rule = “ k’ k-k’“ 11. } 12. } Step 7: Find the support of the new candidate using the dataset. Click Here http://web.singnet.com.sg/~waichoy/cs6203/
Marketing Applications-ZoominUR • Beliefs • Shoppers in households with childrentend to purchase regular beverages more than diet • Unexpected pattern • When there is a large store advertisement, shoppers with childrenbuy diet beverages(Opposite product) http://web.singnet.com.sg/~waichoy/cs6203/
Marketing Applications-ZoominUR (cont.) • Beliefs • Processionalstend to shop more on weekends than on weekdays • Unexpected pattern • In December, professionals tend to shop more on weekdays than on weekends (opposite) • Professionals in large householdstend to shop more on weekdays than on weekends (opposite)
Marketing Applications-ZoomoutUR • Beliefs • Processionalstend to shop more on weekends than on weekdays • Unexpected pattern • In December, shoppers in generalshop more on weekdays than on weekends • Not necessary be a “professional in December” effect, but shoppers in general • This rule is not just a refinement of the belief, but a much different rule http://web.singnet.com.sg/~waichoy/cs6203/
Mining Web Logfile Data-ZoominUR • Belief • For all files, all weeks, the number of hits to a file each week is approximately equal to the file’s average weekly hits • Unexpected pattern • For a certain “Call for Papers” file, in the weeks from September 1- through October 29, the weekly access count is much higher than the average http://web.singnet.com.sg/~waichoy/cs6203/
Problem • How good is the initial set of beliefs • It’s difficult to obtain belief information in practice, especially specific domain knowledge http://web.singnet.com.sg/~waichoy/cs6203/
Related work • Degree of belief • Bayesian approach, frequency approach, etc. • When changes to the user-defined beliefs occur, this means that there are interesting patterns in the data http://web.singnet.com.sg/~waichoy/cs6203/
Related work • Perform post analysis to deal with interestingness problem • Fuzzy matching • General impression • Matching discovered rules • Rank and find the unexpected rules http://web.singnet.com.sg/~waichoy/cs6203/
Conclusion • Comparison operators are needed since many of the interesting patterns are expressed in these terms. • It's difficult to discover relevant patterns from the raw data without the beliefs because beliefs provide valuable domain knowledge that results in the creation of several defined views and also drive the discovery process. http://web.singnet.com.sg/~waichoy/cs6203/
Conclusion (cont.) • User-defined beliefs can drastically reduce the number of irrelevant and obvious patterns found during the discovery process and focus on the discovery of unexpected patterns http://web.singnet.com.sg/~waichoy/cs6203/
References • Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Database mining: A performance perspective," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 914-925 • Christopher J. Matheus, Philip K. Chanm and Gregory Piatesky-Shapiro, "Systems for Knowledge Discovery in Databases," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 5, NO 6, DECEMBER 1993, pp.903-912 • Vasant Dhar, Alexamder Tuzhilin, "Abstract-Driven Pattern Discovery in Databases," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 926-938 • .Liu, B., Hsu, W. and Chen, S, " Using General Impressions to Analyze Discovered Classification Rules," PROC. OF THE THRID INTL' CONF. ON KNOWLEDDGE DISCOVERY AND DATA MINING (KDD 97), pp. 25-36 • Silberschatz, A. and Tuzhilin, A., "What makes Patterns Interesting in Knowledge Discovery Systems," IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO 6, DECEMBER 1993, 970 http://web.singnet.com.sg/~waichoy/cs6203/
References (cont.) • Bing Liu, Wynne Hsu, Lai-Fun Mun and Hing-Yan Lee, "Finding Interesting Patterns Using User Expectations,"Technical report, TRA7/96, Department of Information Systems and Computer Science, National University of Singapore, 1996 • Bing Liu and Wynne Hsu, "Post-Analysis of Learned Rules," Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), Aug 4-8, 1996, Portland, Oregon, USA, pp. 828-834 http://web.singnet.com.sg/~waichoy/cs6203/
Question & Answer ? http://web.singnet.com.sg/~waichoy/cs6203/