Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport

Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport C.I. Ezeife Min Chen IDEAS’04

Introduction • This paper proposes an algorithm, PL4UP, which uses the PLWAP tree structure to incrementally update web sequential patterns. • The process of generating new patterns in the updated database (old + new data) using only the updated part (new data) and previously generated patterns is called incremental mining of sequential patterns.

The Proposed Incremental PL4UP Algorithm • F :frequent items S: small items • F’: updated large (frequent) items S’: updated small items in the updated database U (old + new data) six categories of items : (1) large in old database, DB and still large in updated database U (F→F’) (2) large in old DB but small in U (F→S’) (3) small in old DB but large in U(S→F’) (4) small in old DB and still small in U(S→S’) (5) new items that are large in U (∅ →F’) (6) new items that are small in U (∅ →S’). The main idea of PL4UP is to avoid scanning the whole updated database, U (DB+db) but to scan only the changes to the database (db).

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP • 1. Construct initial PLWAP tree using tolerance minimum support, t • t = factor * s 0≤factor≤1. • frequent 1-items (meeting support s), F1 • frequent 1-items (meeting support t), Ft • Small 1-items (not meeting support s), S1. • potentially frequent 1-items (in S1 list but meeting support t), PF1. • potentially still small 1-items, SS1 (in S1 list but not meeting the support t) • we obtain two subsequences, the frequent subsequence meeting s requirements (seq s) and the frequent subsequence meeting the t requirements (seq t).

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP >3 >2 C1 ={a:5, b:5, c:3, d:1, e:2, f:2:, g:1, h:1}. F1 ={a, b, c} >=3 Ft ={a, b, c, e, f} >=2 S1 ={e, f, d, g, h}, <3 PF ={e, f} 2=<X<3 SS ={d, g, h} <2

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP After checking all patterns, the list of TFP mined is:{a:5, aa:4, aac: 3, ab:5, aba:4, abac:3,abc:3, abcc:2, abe:2, abf:2, ac:3, acc:2, ae:2, af:2, b:5.ba:4, bac:3, bab:1, bc:3, bcc:2, be:2, bf:2, c:3, cc:2,e:2, f:2}. SFP ={a:5, aa:4, aac: 3, ab:5, ac:3, aba:4,abac:3, abc:3, b:5. ba:4, bac:3, bc:3, c:3}.

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP C’1 = C’1∪Cdb1 . C1 ={a:5, b:5, c:3, d:1, e:2, f:2:, g:1, h:1}. Cdb1={a:2, b:1, e:2, f:2, g:2, h:2}, C’1 ={a :7,b : 6,c : 3,d : 1,e : 4,f : 4,g : 3,h : 3}. F’1 ={a : 7,b : 6,e : 4,f :4} >=4 F’t ={a : 7,b : 6,c : 3,e : 4,f : 4,g : 3,h : 3}.>=3 Fdb1 = Cdb1∩F’1={a:2, b:1, e:2, f:2}. Fdbt = Cdb1∩F’t={a : 2,b : 1,e : 2,f : 2,g : 2,h : 2}. Sdb1 = Cdb1∩S’1={g:2}. PF’={c:3, g:3, h:3 }. PFdb= Cdb1∩PF’={g:2, h:2}. SS’={d:1}. SSdb= Cdb1∩SS’=∅

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP Fdbt = Cdb1∩F’t ={a : 2,b : 1,e : 2,f : 2,g : 2,h : 2}.

2.1 Steps in Mining Frequent PatternsIncrementally with PL4UP The mined TFPdb={a:2, ae:2, aef:2, af:2, b:1, ba:1, bae:1, baef:1, be:1, bf:1, ..., e:2, ef:2, f:2} =SFPdb • SFP’=TFP∪ SFPdb ={a:7, aa:4, aac:3, ab:5,aba:4, abac:3, abc:3, ac:3, ae:4, af:4, b:6, ba:5, bac:3,bc:3, be:3, bf:3, c:3, e:4, f:4} SFP’={a:7,aa:4, ab:5, aba:4, ae:4, af:4, b:6, ba:5, e:4, f:4} >4 • TFP’=TFP∪ TFPdb ={a:7, aa:4, aac:3, ab:4, aba:4, abac:3, abc:3,ac:3, ae:4, af:4, b:6, ba:5, bac:3, bc:3, be:3, bf:3, c:3,e:4, f:4}.

SSM : A Frequent Sequential Data Stream Patterns Miner C.I. Ezeife, Mostafa Monwar CIDM 2007

Introduction Existing work on mining frequent patterns on data streams are mostly for non-sequential patterns. This paper proposes SSM-Algorithm (Sequential Stream Mining-algorithm), that uses three types of data structures 1.D-List, 2.PLWAP tree 3.FSP-tree to handle the complexities of mining frequent sequential patterns in data streams.

Example Assume a continuous stream with rst stream batch B1 consisting of stream sequences as: [(abdac, abcac , babfae, afbacfcg)]. minimum support, s is 0.75 tolerance error e, is 0.25 C1B1 = {a:4, b:4, c:3, d:1, e:1, f:2, g:1} L1B1 = {a:4, b:4, c:3, f:2}. >=2 For the items (a, b, c, d, e, f, g) given as (2, 100, 0, 202, 10, 110, 99) are used in the hash function (mod 100)

Example

Example • Step 3: Construct FSP-tree and insert all FPB1 into FSPtree without pruning any items for the rst batch B1 to obtain Figure 5. • FPB1 = {a:4, aa:4, aac:3, ab:4,aba:4, abac: 3, abc: 3, ac: 3, acc:2, af: 2, afa: 2, b: 4, ba:4, bac: 3, bc: 3, c: 3, cc: 2, f: 2, fa: 2}.

Example • When the next batch, B2 of stream sequences, (abacg, abag, babag, abagh) arrives, • 1.The SSM updating the D-List using the stream sequences. • 2.deletes all elements not meeting the minimum tolerance support of |D| * e (or 8 * 0.25 = 2) from the D-List, and keeps all items with|D| ∗ (s − e) = 8 ∗ 0.5 = 4 frequency counts in the L1B2 • 3. construct PLW APB2 • 4.mined to generate the FSPB2 . This FSPB2 is used for obtaining frequent sequential patterns on demand.

Incremental Mining of Sequential Patterns over a Stream Sliding Window Chin-Chuan Ho, Hua-Fu Li *, Fang-Fei Kuo and Suh-Yin Lee

introduction • For mining of sequential patterns over data streams, there are three models adopted by many researchers in ways of time spanning: landmark model sliding window model, and damped window model • IncSPAM= sliding window model+ SPAM

3. The IncSPAM Algorithm

3.1 A Brand-New Concept of Sliding Windowfor Sequences

3.2 Customer Bit-Vector Array with SlidingWindow (CBASW)

3.6 Incremental Mining of Sequential Patternsusing SPAM (IncSPAM)

3.7 Weight of Customer-Sequence

COMPARE

Incremental Mining of Web Sequential Patterns Using PLWAP Tree on Tolerance MinSupport