180 likes | 321 Views
Non-Derivable Item Set and Non-Derivable Literal Set Representations of Patterns Admitting Negation. Marzena Kryszkiewicz DaWak 2009. Outline. Motivation Preliminary Representing Frequent Itemsets with Non-derivable itemsets Patterns admitting negation
E N D
Non-Derivable Item Set and Non-Derivable Literal Set Representations of Patterns Admitting Negation MarzenaKryszkiewicz DaWak 2009
Outline • Motivation • Preliminary • Representing Frequent Itemsets with Non-derivable itemsets • Patterns admitting negation • Properties of Derivable and Non-derivable Lisets • Representing frequent positive and negative patterns • Conclusion
Motivation • Patterns and association rules can be generalized by admitting negation. • E.g. 75% of customers who buy coke also buy chips and neither beer nor milk. • Admitting negation in patterns usually results in an abundance of mined patterns, which makes analysis of the discovered knowledge infeasible. • It is preferable to discover and store a possibly small fraction of patterns, from which one can derive all other significant patterns when required.
(Cont.) • In this paper, the properties of derivable and non-derivable patterns are examined. • The important relationships among patterns admitting negation that have the same canonical variation are established. • Lossless representations of frequent positive patterns were discussed. E.g. NDRL(non-derivable literal sets lossless representation), and NDIR( a concise representation)
Downward Closed Sets • A set is defined as down ward closed, if • Property • Let . If , then sup(X)≥ sup(Y) • The set of all frequent itemsets is down ward closed.
Generalized Disjunctive Rules • Let , is defined a generalized disjunctive rule based on Z, if and • sup( ) is defined as the number of transactions in D in which X occurs together with at least one item from A. • E.g. , and
(Cont.) • Thm: Let be a generalized disjunctive rule. Then: • E.g. • err ( ) is defined as the number of transactions containing X that do not contain any item from A • is defined a certain rule, if err ( ) =0
(Cont.) • Let be a generalized disjunctive rule. Then : • Let be a generalized disjunctive rule. Then : • doubt!! • E.g. be a generalized disjunctive rule. Then:
Using Generalized association rules to estimate supports of itemsets • , when |Y|is even • , when |Y|is odd • Given itemsetB, we obtain the folowing set of 2|B| inequalities bounding sup(B):
Representing Frequent Itemsets with Non-derivable itemsets • An itemsetX is defined as non-derivable if l(X)≠u(X) • NDR was defined as the set of all frequent non-derivable itemsets stored altogether with their supports:
Patterns admitting negation • A liset is defined as a set consisting of non-contradictory literals • A liset is called positive if all literals contained in it are positive.
(Cont.) • A canonical variation of a lisetX is defined as an itemset obtained from X by replacing all negative literals in X. That is, • All lisets having tha same canonical variation as lisetX are denoted by
(Cont.) • Example:
Properties of Derivable and Non-derivable Lisets • Thm: Let B be a liset. • The bound on the length of non-derivable lisets contains at most • at least 2|Z|-1 variations of Z have supports greater than 0. Hence, 2|Z|-1 ≤|D|, so |Z|≤
Representing frequent positive and negative patterns • NDLR(non-derivable liset representation of frequent patterns admitting negation) as the family of all frequent non-derivable lisets stored altoghther with their supports: • NDIR (non-derivable itemset representation of frequent patterns admitting negation) is defined as non-derivable itemsets stored altogether with their supports each of which has at least one frequent variation:
Conclusion • It introduced two lossless representations of frequent patterns admitting negation • doubt!!