210 likes | 339 Views
When to Update the Sequential Patterns of Stream Data?. Q. Zheng, K. Xu, and S. Ma, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003 . Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2004.8.12. Introduction.
E N D
When to Update the Sequential Patterns of Stream Data? Q. Zheng, K. Xu, and S. Ma, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003. Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2004.8.12
Introduction • An experimental method, called TPD (Tradeoff between Performance and Difference), to decide when to update the sequential patterns of stream data by making a tradeoff between • the performance of increasingly updating algorithms • and the difference of sequential patterns.
Stream Data Model (1) • Stream event: • Ei=<ei, tn> • ei: stream event type • tn: the time of stream event type occurring • Stream tuple: • Qi=((ek1, ek2, …,ekm), ti)=(Ek1, Ek2, …, Ekm) • Length Stream tuple: • |Qi|=|(ek1, ek2, …, ekm)|=m
Stream Data Model (2) • Stream queue: • Sij=<Qi, Qi+1, …, Qj>, where ti< ti+1< …< tj • =<(Ei1, …, Eik)…(Ej1, …, Ejm)> • Length of queue: • |Sij|=<Qi, Qi+1, …, Qj>=j-i+1 • Stream viewing window: • Wk=<Qm, …, Qn|d=n-m+1> • Size of viewing window: • |Wk|=n-m+1=d
Stream Data Model (3) • occur(seqm, Wk): • |the times of seqm occurring in Wk| • Seqm=<ei1, ei2, …, eim> • Wk: an stream viewing window • support(seqm, Wk): • Occur(seqm, Wk) / |Wk|
Stream Data Model - Example • S18=<Q1, Q2 ,Q3, Q4, Q5, Q6, Q7, Q8> • S18=<E2, E5, E1, (E3, E6), E7, E9, E10> • W5=< Q1, Q2 ,Q3, Q4, Q5, Q6, Q7 |d=7>
Sliding Stream viewing window • ΔWi: incremental window, i=0, 1, 2, 3, … • ΔW0: initial window • Wi+1=Wi+ΔWi+1 • |ΔW1|/|W0|: incremental ratio of stream data
Estimation of difference between the old and new sequential patterns • Difference: • LWk: old frequent sequences in Wk • LWk+1: new frequent sequences in Wk+1 • LWkΔ LWk+1 : symmetric difference
The Algorithm of Updating Sequential Pattern (IUS) (1) • IUS algorithm uses the frequent and negativeborder sequences in DB and db as the candidates to compute new frequent sequences and negative border sequences in the updated database U. DB: The original database which contains old time-related data. db: The increment database which contains new time-related data. dd: The decrement database from DB which contains deleted time-related data. U: The updated database. When database being increasingly updated, the total set of data which are equal to DB+db. When database being decreasingly updated, the total set of data which are equal to DB-dd. Support(F, X): the support of the sequence X in the X database, where X ∈ {db, dd, DB, U}. Min_supp:Minimum support threshold of the frequent sequence. Min_nbd_supp: Minimum support threshold of negative border sequence. CX: Candidate sequences in X database, where X ∈{db, dd, DB, U}. LX : Frequent sequences in the X database, where X ∈{db, dd, DB, U}. NBD(X)=CX- LX, where NBD(X) consists of the sequences in X database whose sub_sets are
IUS (2) • Property1: Let B be a frequent sequence in Wk, if , we have occur(A, DB)>occur(B, DB). • Property2: Proof: assume that occur(S,DB)<Min_sup*|DB| and occur(S,db)<Min_sup*|db| occur(S,DB+db)<Min_sup*|DB+db| Support(S,U)<Min_sup, contradict the given condition.
IUS – using the stream data model • Wk: The original stream view window which contains old time-related data. • ΔWk+1: The increment stream view window which contains new time-related data. • Wk+1: The updated stream view window. When stream data being increasingly updated, the total set of data which are equal to Wk+ΔWk+1 • Support(F, X): the support of the sequence F in the X stream view windows, where X ∈{ Wk+1 ,Wk, ΔWk+1}. • Min_supp :Minimum support threshold of the frequent sequence. • Min_nbd_supp: Minimum support threshold of negative border sequence. • CX: Candidate sequences in X stream view windows, where X ∈ { Wk+1 ,Wk, ΔWk+1}. • LX : Frequent sequences in the X stream view windows, where X ∈ { Wk+1 ,Wk, ΔWk+1}. • NBD(X)=CX- LX, where NBD(X) consists of the sequences in X stream view windows whose sub_sets are frequent, its Support is lower than Min_supp and greater than Min_nbd_supp. Note that X ∈ {Wk+1 ,Wk, ΔWk+1}
Tradeoff between Performance and Difference (TPD) (1) • Use the speedups to measurement of IUS: • Speedup=the execution time of Robust_search / the execution time of IUS • Use the difference to measure the old and the new frequent sequence. • Use Min-Max normalization:
TPD (2) • TPD method maps the curve of the speedup and the difference changing with the size of incremental windows into the same graph under the same scale. • The points of intersection of the two curves are the suitable range of the incremental ratio of the initial windows for IUS.
Experiment • conducted a set of experiments to find when to update sequential patterns for stream data. • Environment: • DELL PC Sever with 2 CPU Pentium II • Memory 512M, Disk 16G • Operating system: Red Hat Linux 6.0 • Data1: • the alarms in GSM Networks, contain 194 alarm types and 100k alarm events. • The time of alarm events in the data1 range from 2001-08-11-18 to 2001-08-13-17.
The intersection point: 6K The suitable range of incremental ratio of initial window: 30% of W0. Experiment 1 – on Data 1|initial window|=20k
Experiment 2 – on Data 1|initial window|=40k The intersection point: 9K~10K The suitable range of incremental ratio of initial window: 22.5%~25% of W0.
Experiment 3 – on Data 1|initial window|=50k The intersection point: 15K~18K The suitable range of incremental ratio of initial window: 30%~36% of W0.
Experiment 4 – on Data 1|initial window|=60k The intersection point: 10K~12K The suitable range of incremental ratio of initial window: 16.7%~20% of W0.
Conclusion • TPD method, it is shown experimentally that the suitable range of incremental ratio of initial windows to update is about 20 to 30 percent of the size of initial windows for the IUS algorithm.