1 / 22

Mining Frequent Patterns in Data Streams at Multiple Time Granularities

Mining Frequent Patterns in Data Streams at Multiple Time Granularities. Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, Philip S. Yu Advisor : Jia-Ling Koh

Download Presentation

Mining Frequent Patterns in Data Streams at Multiple Time Granularities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Frequent Patterns inData Streams at Multiple TimeGranularities Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, Philip S. Yu Advisor:Jia-Ling Koh Speaker:Chun-Wei Hsieh 03/12/2004

  2. Problem and Analysis • Infrequent items can become frequent later on and hence cannot be ignored. • I is frequent if its support is no less than min_support . • I is sub-frequent if its support is less than but no less than maximum support error . • Otherwise, I is infrequent.

  3. FP-Stream • FP-stream includes two major components: (1) tilted-time window (2) pattern-tree

  4. Tilted-Time Window • People are often interested in recent changes at a fine granularity, but long term changes at a coarse granularity.

  5. Tilted-Time Window (cont.) • One can register window-based count for each frequent pattern for each tilted-time window.

  6. Pattern-Tree

  7. FP-Stream • Usually frequent patterns do not change dramatically over time. • To save space, it uses only one frequent pattern tree.

  8. Logarithmic Tilted-time Window • In the natural tilted-time window, at most 59 (4+24+31) tilted windows need to be maintained for a period of one month. • According to logarithmic tilted-time window model, with one year of data and the nest precision at quarter, it need units of time instead of • units.

  9. Logarithmic Tilted-Time Window Updating • Assume that the stream of transactions is broken up into fixed sized batches • The frequencies for each logarithmic tilted-time window are :

  10. Logarithmic Tilted-Time Window Updating (cont.)

  11. Tail Pruning • Let be the tilted-time windows,where is the oldest. • is the window size of . • Drop tail sequences when the following condition holds,

  12. Type I Pruning • If I is found in B but is not in the FP-stream structure, no superset is in the structure. • Hence, if , then none of the supersets need be examined.

  13. Type II Pruning • If all of I’s tilted-time window table entries are pruned (hence I is dropped), then any superset will also be dropped.

  14. Algorithm • INPUT: (1) An FP-stream structure (2) A min_support threshold (3) An error rate (4) an incoming batch, , of transactions. • OUTPUT: The updated FP-stream structure.

  15. Algorithm (cont.) 1. Initialize the FP-tree to empty. 2. Sort each incoming transaction t , according to f_list, and then insert it into the FP-tree without pruning any items. 3. When all the transactions in are accumulated, update the FP-stream as follows.

  16. Algorithm (cont.) • Mine itemsets out of the FP-tree using FP- growth algorithm. For each mined itemset, I, check if I is in the FP-stream structure. If I is in the structure, do • If I is not in the structure and if ,then insert I into the structure. Otherwise,FP-growth stops mining supersets of I (Type I Pruning).

  17. Algorithm (cont.) (b) Scan the FP-stream structure (depth-first search). For each itemset I encountered, check if I was updated when B was mined.If not ,then insert 0 into I’s tilted-time window table.Prune I’s table by tail pruning. Once the search reaches a leaf, if the leaf has an empty tilted-time window table, then drop the leaf. If there are any siblings of the leaf, continue the search with them. If there were no siblings, then return to the parent and continue the search with its siblings. If all of the children of the parent were dropped, then the parent becomes a leaf node and might be dropped.

  18. Experimental Results

  19. Experimental Results (cont.)

  20. Experimental Results (cont.)

  21. Experimental Results (cont.)

  22. Experimental Results (cont.)

More Related