Marzena Kryszkiewicz DaWak 2009

Non-Derivable Item Set and Non-Derivable Literal Set Representations of Patterns Admitting Negation MarzenaKryszkiewicz DaWak 2009

Outline • Motivation • Preliminary • Representing Frequent Itemsets with Non-derivable itemsets • Patterns admitting negation • Properties of Derivable and Non-derivable Lisets • Representing frequent positive and negative patterns • Conclusion

Motivation • Patterns and association rules can be generalized by admitting negation. • E.g. 75% of customers who buy coke also buy chips and neither beer nor milk. • Admitting negation in patterns usually results in an abundance of mined patterns, which makes analysis of the discovered knowledge infeasible. • It is preferable to discover and store a possibly small fraction of patterns, from which one can derive all other significant patterns when required.

(Cont.) • In this paper, the properties of derivable and non-derivable patterns are examined. • The important relationships among patterns admitting negation that have the same canonical variation are established. • Lossless representations of frequent positive patterns were discussed. E.g. NDRL(non-derivable literal sets lossless representation), and NDIR( a concise representation)

Downward Closed Sets • A set is defined as down ward closed, if • Property • Let . If , then sup(X)≥ sup(Y) • The set of all frequent itemsets is down ward closed.

Generalized Disjunctive Rules • Let , is defined a generalized disjunctive rule based on Z, if and • sup( ) is defined as the number of transactions in D in which X occurs together with at least one item from A. • E.g. , and

(Cont.) • Thm: Let be a generalized disjunctive rule. Then: • E.g. • err ( ) is defined as the number of transactions containing X that do not contain any item from A • is defined a certain rule, if err ( ) =0

(Cont.) • Let be a generalized disjunctive rule. Then : • Let be a generalized disjunctive rule. Then : • doubt!! • E.g. be a generalized disjunctive rule. Then:

Using Generalized association rules to estimate supports of itemsets • , when |Y|is even • , when |Y|is odd • Given itemsetB, we obtain the folowing set of 2|B| inequalities bounding sup(B):

(Cont.)

Representing Frequent Itemsets with Non-derivable itemsets • An itemsetX is defined as non-derivable if l(X)≠u(X) • NDR was defined as the set of all frequent non-derivable itemsets stored altogether with their supports:

Patterns admitting negation • A liset is defined as a set consisting of non-contradictory literals • A liset is called positive if all literals contained in it are positive.

(Cont.) • A canonical variation of a lisetX is defined as an itemset obtained from X by replacing all negative literals in X. That is, • All lisets having tha same canonical variation as lisetX are denoted by

(Cont.) • Example:

Properties of Derivable and Non-derivable Lisets • Thm: Let B be a liset. • The bound on the length of non-derivable lisets contains at most • at least 2|Z|-1 variations of Z have supports greater than 0. Hence, 2|Z|-1 ≤|D|, so |Z|≤

Representing frequent positive and negative patterns • NDLR(non-derivable liset representation of frequent patterns admitting negation) as the family of all frequent non-derivable lisets stored altoghther with their supports: • NDIR (non-derivable itemset representation of frequent patterns admitting negation) is defined as non-derivable itemsets stored altogether with their supports each of which has at least one frequent variation:

(Cont.)

Conclusion • It introduced two lossless representations of frequent patterns admitting negation • doubt!!

Marzena Kryszkiewicz DaWak 2009

Marzena Kryszkiewicz DaWak 2009

Presentation Transcript

Marzena Ksel MD Strasbourg 27th of May 2014

PCI 2009 Eureka 2009

Protection assessment of Natura 2000 network areas in Poland – main problems Marzena Modrowska

Marzena Dzida, Mirosław Chorążewski

ARES-2009 CISIS-2009

2009