200 likes | 215 Views
Explore the identification of specific partial orders in sequences by constructing hybrid episodes and stable sequences, addressing redundancy issues.
E N D
LARCA MOISES meeting Coproduct Transformations on Lattices of Closed Partial Orders Gemma Casas-Garriga gcasas@lsi.upc.es MOISES meeting, Valladolid, Sept 2004
Id 1 < (a) (b) (c) (d) > 2 < (b) (c) (d) (a) > 3 < (b) (c) (a) (d) > Data Description • A sequence is an ordered list of sets of items: • <(I1) (I2) … (IM)> • For example, < (a c) (d b) (e) (a) > • We consider a set of sequencesD to be analyzed. D = {s1, … , sN} where each si is a sequence.
A A B B C C D D 1 < (a) (b) (c) (d) > 2 < (b) (c) (d) (a) > 3 < (b) (c) (a) (d) > Basic Definitions • An episode in D is an acyclic directed graph, indicating a partial order between items. • The support of a poset in D is the number of input sequencesthat are compatible with it. it is compatible with second and third input sequences
1 < (a) (b) (c) (d) > P’ A 2 A < (b) (c) (d) (a) > B C B C P 3 < (b) (c) (a) (d) > D Problem Formulation • Goal: to identify posets and their support (alternatively, whose support is over a minimum user-specified threshold) • Problem: many redundant partial orders ... For ex. both P and P’are compatible with the same input sequences, but P is more “informative” than P’. P’ P
|| || , , , A B C D A B C D Problem Formulation • If P’ P we say that P is more specific than P’. • Specificity relation is different from classical inclusion of episodes. • Goal redefined: to identify the most specific partial orders among those occurring in the same input seqs (alternatively, with support over a minimum threshold).
|| , || A B C D 1 < (a) (b) (c) (d) > A B B A A 2 B C B B C C C A C D < (b) (c) (d) (a) > D D D A D 3 < (b) (c) (a) (d) > Example Input seqs where the poset is the most specific. 1,2,3 2,3 1,3 1 2 3
Motivation • Ordering relationships are useful in many domains: web mining, monitoring of processes, e-comerce ... • The most specific episodes give a general view of D, summarizing all the input sequences without redundancies.
A B C D Addressing the Problem • Observation: Identifying such structures directly from the data is a complex task (specificity relation is difficult to calculate). • Our proposal: • Constructing hybrid episodes out of their maximal paths. • That is, finding those subsequences in D that will identify maximal paths of the final desired episodes. • Two max paths: • <(b) (c) (a)> • <(b) (c) (d)>
|| , || A B C D A B A B A B B B C C A C C D C D A D D D Our Proposal Set of all seqs. identifying max. paths: <(a)> <(b) (c) (d)> <(b) (c) (a)> <(a) (d)> <(a) (b) (c) (d)> <(b) (c) (d) (a)> <(b) (c) (a) (d)> What are these sequences?
Stable sequences are maximal among those having the same number of occurences (support) in D. • {s | s’ s.t s s’ and support(s) = support(s’)} 1 < (a) (b) (c) (d) > 2 < (b) (c) (d) (a) > 3 < (b) (c) (a) (d) > Result 1 Theorem:sequences identifying maximal paths of the most specific posets are a particular case of so-called stable sequences. • < (a) (d) > is stable • < (b) (d) > is not stable because it is contained in < (b) (c) (d) > that has the same support. • Many algorithms for minig stable seqs: CloSpan, BIDE, TSP ...
|| , || A B C D A B A B A B B B C C A C C D C D A D D D Some stable sequences may identify maximal paths of different partial orders. How to construct posets out of Stable Sequences? Stable Sequences <(a)> <(b) (c) (d)> <(b) (c) (a)> <(a) (d)> <(a) (b) (c) (d)> <(b) (c) (d) (a)> <(b) (c) (a) (d)>
1 < (a) (b) (c) (d) > 2 < (b) (c) (d) (a) > 3 < (b) (c) (a) (d) > Result 2 • We characterize a closure operator working on sets of sequences. • A closure operator satisfies the three basic closure axioms: Monotonicity, Extensivity, and Idempotency. • Broadly: Given any set of sequencesS, (S)returns the set of maximal sequences that are present in the same input sequences whereSis contained. ({< (b) (c) >})= {<(a)>, <(b) (c) (d)> }
1 < (a) (b) (c) (d) > 2 < (b) (c) (d) (a) > 3 < (b) (c) (a) (d) > Result 2 • A set of sequencesSwill be closed if it coincides with its closure: (S)=S Lemma:individually, sequences in a closed set S, are stable sequences. ({<(a)>, < (b) (c) (d)>})= {<(a)>, <(b) (c) (d)> } Both <(a)> and <(b) (c) (d)> are stable sequences.
|| , || A B C D 1 < (a) (b) (c) (d) > A A 2 B B C C < (b) (c) (d) (a) > D D 3 < (b) (c) (a) (d) > Result 2 Theorem:closed sets of sequences identify the maximal paths of the same partial order. Closed set of sequences Partial Orders {<(a)>, <(b) (c) (d)> } {<(b) (c) (a)>, <(b) (c) (d)>} {<(a) (d)>, <(b) (c) (d)>}
|| , || {<(b)(c)(a)(d)>} {<(a)(b)(c)(d)>} {<(b)(c)(d)(a)>} 1 3 2 {<(b)(c)(d)>,<(a)(d)>} {<(b)(c)(d)>,<(b)(c)(a)>} A B C 2,3 1,3 D {<(b)(c)(d)>,<(a)>} A A 1,2,3 B B C C D D Lattice of Closed Sets of Sequences
3 2 1 A B C D 1,3 2,3 A B A B A B C B C B C C C A D D D A D D || , || Lattice of Closed Partial Orders Moreover, these posets are closed. 1,2,3
|| , || A A B C B C D D Formalization • A directed graph is modeled as G=(V,E,l) where V is the set of vertices; E VxV is the set of edges; and l is an injective labelling function. • A poset is an acyclic directed graph, such that the relation on V stablished by edges E is reflexive, antisymmetric and transitive. • A graph morphism between two graphs G=(V,E,l) and G’=(V’,E’,l’) consists of an injective function h:V V’ that preserves labels and (u,v)E implies (h(u),h(v)) E’.
G1 B A C C . . . G G’ Gn A B C Result 3 Coproduct of a family of graphs: Example:
{<(b)(c)(d)>,<(a)(d)>} A A B B C C B A C D D D D Result 3 Theorem:A lattice of stable sequences can be transformed into a lattice of closed posets by rewriting each node via coproduct transformations. Coproduct of:
Conclusions • We identify partial orders in sequential data by: • Mining stable sequences and their support (CloSpan, BIDE …). • Grouping stable sequences in closed sets of sequences, according to operator . • Getting final episodes from those agrupations. • This transformation represents an important algorithmic simplification. • Formally, in case of not having repetition of items, this transformation can be expressed as coproduct transformations.