1 / 25

A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams

A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Joong Hyuk Chang and won Suk Lee, Proc. of the 9 ’ th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’ 03). Adviser: Jia-Ling Koh Speaker: Yu-ting Kung. Introduction.

russ
Download Presentation

A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams Joong Hyuk Chang and won Suk Lee, Proc. of the 9’th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung

  2. Introduction • Most of mining algorithms or frequency approximation algorithm for a data stream don’t able to extract the recent change of information in a data stream adaptively.

  3. Introduction (Cont.) • In this paper, • Propose a sliding window method of finding recently frequent itemsets over an online data stream

  4. Sliding Window Method • Idea: • Define significant itemset: • An itemset whose current support is greater than or equal to an error parameter is a significant itemset • Monitoring onlysignificant itemsets

  5. SW Method (Cont.) • Two different phases • Window initialization phase: • Actives while the number of transactions generated so far in a data stream is less than or equal to a predefined window size. • Insert new transaction in CTL (current transaction list) • No extracted transation • Window sliding phase: • Actives after the CTL becomes full • Insert new transaction in CTL (current transaction list) • The oldest transaction is extracted from the CTL

  6. SW Method (Cont.) • Five steps: • Appending a transaction • Counting updating and insertion of new itemsets • Extracting a transaction • Pruning of itemsets • Frequent itemset selection

  7. Step1: Appending a transaction • Content • The transaction Tk is appended to the current transaction list CTL

  8. f: count of the itemset t: TID which makes the itemset be newly inserted into the monitoring lattice Step2: Counting updating and insertion of new itemsets • Content • For an itemset e that appears in the Tk with an entry (e, f, t): • Case 1 its corresponding node is in the monitoring lattice: •  e.f = e.f + 1 • Case 2 its corresponding node isn’t in the monitoring lattice: •  e is inserted into the monitoring lattice with (e, 1, k)

  9. Step3: Extracting a transaction • When this step is done? • Only in the window sliding phase • Content • Extract the oldest transaction in CTL • Update the entry (e, f, t) of this node in the monitoring lattice: If t <= wfirst e.f = e.f -1; Wfirst : the TID of the first transaction of the current window Else e.f = e.f;

  10. Step4: Pruning of itemsets • Therom: • Given an error parameter , the maximum possible count of an itemset with its entry (e, f, t) is found as follows:

  11. Step4: Pruning of itemsets • When this step is done? • Periodically or when it is needed • Content • For an itemset e with entry (e, f, t) in the monitoring lattice: If , Then it can be regarded as an insignificant itemset  Prune it !!

  12. If its , Step5: Frequent itemset selection • When this step is done? • The up-to-date set of recently frequent itemsets is requested. • Content • For an itemset e with an entry (e, f, t) in the monitoring lattice:  it is a frequent itemset !!

  13. For Example • Data Stream (c) D10 (a) D1 (b) D5 (d) D11 (e) D15

  14. For Example (Cont.) • Initial value • Smin = 0.5 • = 0.25 (0.5 x Smin) • Window size = 10 • Step4 is performed in every 5 transactions.

  15. Step1,2 Step1,2 Step1,2 Step1,2 Step1,2 Step4 D is pruned from the monitoring lattice, becasue Step1,2 recursively (C) After T10 (AE) (a) After T1 (AB) (b) After T2 (D) (b.1) After T3 (AB) (b.2) After T4 (AB) (b.3) After prning (b.3) After T5 (A)

  16. Step1,2 Step3 Step4 Step1,2,3,4 (d) After T11 (AD) (e) After Step4 for T15

  17. Experiment Result • Data souce • T5.I4.D1000K-I • T5.I4.D1000K-II

  18. Experiment (Cont.) • Memory usage in the window sliding phase

  19. Experiment (Cont.) • Average support error • Measure the relative accuracy of the proposed method • When two sets of mining results and are given for the same data set, the average support error ASE(R2|R1) is defined:

  20. Experiment (Cont.) • Average support error of the mining result of the proposed method with respect to that of the Apriori algorithm on the transactions within the current window

  21. Experiment (Cont.) • The average processing time(Step1-Step4) of the sliding window method in each interval

  22. Experiment (Cont.) • The average processing time for Step5

  23. Experiment (Cont.) • The memory usage of the window sliding phase by varying the size of the window

  24. Experiment (Cont.) • The average processing time of the sliding window method by varying the size of a window

  25. Conclusion • The result of the proposed method guarantees the following: • All itemsets whose true supports are greater than or equal to a minimum support Smin are found • No itemset whose true support is less than (Smin- ) is found as a recently frequent itemset • For each itemeset, the difference between its estimated support and its true support is less than

More Related