200 likes | 397 Views
SeqStream: Mining Closed Sequential Pattern over Stream Sliding Windows. Lei Chang Tengjiao Wang Dongqing Yang Hua Luan ICDM’08. Outline. Preliminary. Algorithm. Experimental results. Conclusion. Preliminary. The inverse sequence of a sequence s, denoted by s’ s = <abae>, s’= <eaba>
E N D
SeqStream: Mining Closed Sequential Pattern over Stream Sliding Windows Lei Chang Tengjiao Wang Dongqing Yang Hua Luan ICDM’08
Outline. Preliminary. Algorithm. Experimental results. Conclusion.
Preliminary. • The inverse sequence of a sequence s, denoted by s’ • s = <abae>, s’= <eaba> • An s-projected database Ds • <b>-projected database is {<da>,<ae>,<cda>,<cdae>,<cda>} • The size of Ds denoted as R(Ds) • The size of <b>-projected database is 14.
<e>-projected database is {φ,φ,<bcda>,<ac>} • The size of <e>-projected database is 6. • The inverse database of D, denoted by D’ • The database in current sliding window after inserting(but before removing), denoted by D^. • D^ : {<fbda>,<abaec>,<fbcdac>,<bcdae>,<ebcdaf>,<aeac>}
In the inverse database of D^, the set of sequence from user appear in current window is called an insertion database denoted by D+. The set of sequence from user that appear in remove winodw is called a removal database denoted by D-.
D^ : {<fbda>,<abaec>,<fbcdac>,<bcdae>,<ebcdaf>,<aeac>} D^’: {<adbf>,<ceaba>,<cadcbf>,<eadcb>,<fadcbe>,<caea>} D+ : {<ceaba>,<cadcbf>,<fadcbe>} D- : {<cadcbf>,<eadcb>,<fadcbe>}
closed pattern : {<a>:6,<ae>:3,<c>:4,<ba>:5,<bda>:4, <bcda>:3,<e>:4} • closed pattern : {<a>:6,<ab>:5,<adb>:4,<adcb>:3,<c>:4, <e>:4,<ea>:3}
sn : A node n of an IST corresponds a sequence that starts from the root node to that node, and the sequence is denoted by Sn. • c-node : If sn is a closed sequential sequence in D’, n is a c-node. • t-node : If sn is not a closed sequential sequence in D’ and it does not have any t-node ancestor. • i-node : n is neither a c-node nor t-node.
Algorithm. Element insertion Element removal State update
Element insertion • Theorem 2 : If a depth-1 node whose item does not occur in the newly coming element, nodes under that node will not change their attribute values and any t-node under it does not change its type after inserting the element. • Theorem 3 : After inserting a new element, if the PDBSize and support of a t-node do not change, it will keep to be a t-node.
Dc^’ : {<eaba>,<adbf>,<b>,<be>,<aea>} Df^’ : {φ, φ,<adcbe>} c : {<eaba>,<ab>,<b>,<be>,<aea>} ca : {<ba>,<b>,<ea>} cb : {<a>, φ,<e>} ce : {<aba>, φ,<a>}
Element removal • Theorem 5 : After the removal of etc−w, a t-node may be deleted, but it never changes to a c-node or an i-node. • For each child node t of n, it computes st-projected database in the removal database D−
D − : {<cadcbf>,<eadcb>,<fadcbe>} • Da−: {<dcbf>,<dcb>,<dcbe>} • Db−: {<f>,φ,<e>} • Dc − : {<adbf>,<b>,<be>} • …… • Df − : {φ,<adcbe>}
State update • Theorem 6 : Given a t-node n in an IST for the inverse database D, there must exist an i-node or a c-node t in the IST. • i-node => c-node • c-node => t-node
Conclusion. This paper has proposed a Seqstream algorithm to mine closed sequential pattern in sliding window. Designed for multi-stream?